Quality Prediction and Abnormal Processing Parameter Identification in Polypropylene Fiber Melt Spinning Using Artificial Intelligence Machine Learning and Deep Learning Algorithms

Melt spinning machines must be set up according to the process parameters that result in the best end product quality. In this study, artificial intelligence algorithms were employed to create a system that detects abnormal processing parameters and suggests strategies to improve quality. Polypropylene (PP) was selected as the experimental material, and the quality achieved by adjusting the melt spinning machine’s processing parameter settings was used as the basis for judgement. The processing parameters included screw temperature, gear pump temperature, die head temperature, screw speed, gear pump speed, and take-up speed as the six control factors. The four quality characteristics included fineness, breaking strength, elongation at break, and elastic energy modulus. In the first part of our study, we applied fast deep-learning characteristic grid calculations on a 440-item historical data set to train a deep learning neural network and determine methods for multi-quality optimization. In the second part, with the best processing parameters as a benchmark, and given abnormal quality data derived from processing parameter settings deviating from these optimal values, several machine learning and deep learning methods were compared in their ability to find the settings responsible for the abnormal data, which was randomly split into a 210-item training data set and a 210-item verification data set. The random forest method proved to be the best at identifying responsible parameter settings, with accuracy rates of single and double identification classifications together of 100%, for single factor classification of 98.3%, and for double factor classification of 96.0%, thereby confirming that the diagnostic method proposed in this study can effectively predict product abnormality and find the parameter settings responsible for product abnormality.


Introduction
Modern synthetic fibers are mainly manufactured by three methods: dry spinning, wet spinning, or melt spinning. Among them, melt spinning is the most commonly used in the industry due to its low cost and the stability of the process [1,2]. However, due to the wear and tear of equipment and the resulting maintenance or replacement of different parts of the melt spinning process, product quality can be expected to deviate from the original. Inability to achieve the preset best quality and abnormality of the process may also be due to a number of other factors, such as the experience of the operator of the machine and parameter settings, among other issues. Because of the above factors, the causes of abnormality are also more difficult to analyze. Currently, the industry almost completely relies on the expertise of technologists to solve the problem.
If a specialized analysis technique can be developed to explore the link between abnormal quality and processing parameters, it will significantly enhance the maintenance vector machines and the k-nearest neighbor algorithm, confirming that the random forest was the best for their data. Beyond industry, random forest classification has also achieved very good results in many other fields, such as medical analysis [18] and biological field [19]. In the field of artificial intelligence, effective classifiers can be developed not only with random forests, but also with the aforementioned neural networks. Artificial neural networks use nonlinear activation functions and have weighted outputs from multiple neurons into deeper neuron layers, with multiple layers connected in sequence to increase learning accuracy [20]. Ali et al. [21] studying the non-stationary nonlinear characteristics of rolling bearing vibration signals, used a neural network to classify bearing defects. Their experimental results showed that this method can reliably classify defects.
In this study, we proposed a set of methodologies that engineers may use as criteria to determine whether product quality is deteriorating and to immediately assist them in identifying causes, ultimately increasing product quality and lowering manufacturing costs. As a result, the goal of this research is to find the best process parameter settings for melt spinning machines and to identify those that cause anomalous processing.

Methods and Materials
The experimental method and artificial intelligence steps are shown in Figure 1, including melt spinning experiment, data pre-processing, artificial intelligence classifier, single and double anomaly identification, and classification results.
large number of features as input, the accuracy rate was as high as 97%. Sanchez et al. [17] used a deep random forest to diagnose failures of gearboxes and compared it with other machine learning algorithms such as support vector machines and the k-nearest neighbor algorithm, confirming that the random forest was the best for their data. Beyond industry random forest classification has also achieved very good results in many other fields, such as medical analysis [18] and biological field [19]. In the field of artificial intelligence, effective classifiers can be developed not only with random forests, but also with the aforementioned neural networks. Artificial neural networks use nonlinear activation functions and have weighted outputs from multiple neurons into deeper neuron layers, with multiple layers connected in sequence to increase learning accuracy [20]. Ali et al. [21] studying the non-stationary nonlinear characteristics of rolling bearing vibration signals, used a neural network to classify bearing defects. Their experimental results showed that this method can reliably classify defects.
In this study, we proposed a set of methodologies that engineers may use as criteria to determine whether product quality is deteriorating and to immediately assist them in identifying causes, ultimately increasing product quality and lowering manufacturing costs. As a result, the goal of this research is to find the best process parameter settings for melt spinning machines and to identify those that cause anomalous processing.

Methods and Materials
The experimental method and artificial intelligence steps are shown in Figure 1, including melt spinning experiment, data pre-processing, artificial intelligence classifier single and double anomaly identification, and classification results.

Random Forest
The random forest is composed of decision trees, and each decision tree in the random forest is not related [22]. After obtaining the random forest, when a new input sample enters, each decision tree in the random forest makes a judgment to predict which class the sample should belong to, and finally, each decision tree votes to predict which class

Random Forest
The random forest is composed of decision trees, and each decision tree in the random forest is not related [22]. After obtaining the random forest, when a new input sample enters, each decision tree in the random forest makes a judgment to predict which class the sample should belong to, and finally, each decision tree votes to predict which class the sample belongs to. Although each decision tree in the random forest obtained by this algorithm is very weak, the combination of each decision tree works very well. This method is also called ensemble learning [23]. Its calculation steps: (1) Define a random sample of size n, and randomly select n data from the data set.
(2) From the selected n data, a decision tree is trained, d features are randomly extracted for each node in the decision tree, and then the features are used to divide the node. (3) Repeat steps 1~2 k times with improvements. The more commonly used improvement is Adaboost. (4) Summarize the predictions of all decision trees and decide the result of this classification by voting majority or weighted voting.
AdaBoost is an algorithm that improves boosting [24] and is the most commonly used one today. The idea is to increase the weight of the samples misclassified by the previous decision trees so that each time a new decision tree is trained it can focus on training data that is easily misclassified. Each decision tree uses weighted voting instead of the average voting mechanism. Weak classifiers with higher accuracy have larger weights, and weak classifiers with lower accuracy have lower weights.
First, a set of training data is given: Assuming that the weight of the kth time is w i k , the weight of each sample of the first decision tree classifier is: First train the first decision tree classifier f k (x) with weights w i k . Assuming that L decision tree classifiers are trained, when training the kth one: The ε k is the error of the kth decision tree classifier. Finally, f L (x) decision tree classifiers are obtained, and the results of all decision tree classifiers are weighted to vote:

Neural Network
A neural network mimics the human brain, in a structure consisting of thousands of interconnected neurons. A neuron can be connected to multiple neurons in its rear layer for output, and multiple neurons in a front layer for reception. Neurons and perceptron's are mathematical functions that multiplies input data from the input layer (×1, ×2 . . . ) by a weight (w1, w2 . . . ) and adds a bias (b) to the weighted inputs (hidden layer). The result is then put via an activation function (f ) to introduce nonlinearity to the network. All incoming data points receive a weight, are multiplied and added together, and passed to a non-linear activation function. An example of a single-layer neural network-like architecture is shown in Figure 2. The output of each neuron: () a f wp b =+ (6) After calculating the loss between the output layer of the neural network and the correct value, the neural network will modify the neural network parameters through the back-propagation algorithm. Since neural networks are inherently nonlinear, consisting of multiple inputs and multiple outputs, they are suitable for modeling complex nonlinear systems.
In order to find the smallest loss function, the training of neural network often uses the gradient descent algorithm to achieve optimization.
where W is the weight parameter; γ is the learning rate; L is the loss function.
L W   is the gradient of the loss function to the weight parameter.

Activation Functions
Activation functions are functions that are used in neural networks to compute the weighted total of input and biases, which is then used to determine whether or not a neuron can activate [25]. They are used to control the outputs of our neural networks in a variety of areas, including object recognition and classification, as well as other domains, to name a few, with early research findings demonstrating unequivocally that good activation function selection improves neural network computing results. ReLU, Mish, and Sigmoid Functions are the activation functions employed in this work.

Optimization Techniques
Optimization is one of the important aspects of deep learning, as it helps a model train better when the weights are modified, so that it can reduce the loss error and also handle the dimensionality problem during back propagation. Ruder [26] investigated the convergence time, number of fluctuations, and parameter update rate of multiple stochastic gradient descent-based optimization algorithms, including SGD and SGD with momentum, Adam, and RMSProp, using varying numbers of iterations and particular values of the test function.

Materials
PP is an outstanding textile material which is purchased from polyacrylic polymer Globalene 6331 (LCY Chemical Corp., Taipei, Taiwan). The chemical structure is shown The output of each neuron: After calculating the loss between the output layer of the neural network and the correct value, the neural network will modify the neural network parameters through the back-propagation algorithm. Since neural networks are inherently nonlinear, consisting of multiple inputs and multiple outputs, they are suitable for modeling complex nonlinear systems.
In order to find the smallest loss function, the training of neural network often uses the gradient descent algorithm to achieve optimization.
where W is the weight parameter; γ is the learning rate; L is the loss function. ∂L ∂W is the gradient of the loss function to the weight parameter.

Activation Functions
Activation functions are functions that are used in neural networks to compute the weighted total of input and biases, which is then used to determine whether or not a neuron can activate [25]. They are used to control the outputs of our neural networks in a variety of areas, including object recognition and classification, as well as other domains, to name a few, with early research findings demonstrating unequivocally that good activation function selection improves neural network computing results. ReLU, Mish, and Sigmoid Functions are the activation functions employed in this work.

Optimization Techniques
Optimization is one of the important aspects of deep learning, as it helps a model train better when the weights are modified, so that it can reduce the loss error and also handle the dimensionality problem during back propagation. Ruder [26] investigated the convergence time, number of fluctuations, and parameter update rate of multiple stochastic gradient descent-based optimization algorithms, including SGD and SGD with momentum, Adam, and RMSProp, using varying numbers of iterations and particular values of the test function.

Materials
PP is an outstanding textile material which is purchased from polyacrylic polymer Globalene 6331 (LCY Chemical Corp., Taipei, Taiwan). The chemical structure is shown in Figure 3. It features abrasion resistance, flexibility, high strength resistance, light weight, strong antistatic character, good chemical resistance, and a low cost. Acid and alkali resistance, water repellency, quick drying, bacteria repellency, below thermal conductivity, Polymers 2022, 14, 2739 6 of 23 warmth retention, low glass transition point, low temperature resistance, low energy consumption, low CO 2 emission, decomposability, no dyeing wastewater pollution, and oil absorption are among its advantages over PET, Nylon 6, and Nylon 66 fibers. PP fiber is now widely utilized and has gained economic relevance in the production of home furnishings and industrial uses. The PP fiber is one of three primary synthetic fibers and is generally made via melt spinning. In the vertical or horizontal screw extruder, the acrylic resin is heated and molten, then extruded by nozzle through the metering pump and cooled to fiber in the air [27,28].
Polymers 2022, 14, x FOR PEER REVIEW 6 of 24 in Figure 3. It features abrasion resistance, flexibility, high strength resistance, light weight, strong antistatic character, good chemical resistance, and a low cost. Acid and alkali resistance, water repellency, quick drying, bacteria repellency, below thermal conductivity, warmth retention, low glass transition point, low temperature resistance, low energy consumption, low CO2 emission, decomposability, no dyeing wastewater pollution, and oil absorption are among its advantages over PET, Nylon 6, and Nylon 66 fibers. PP fiber is now widely utilized and has gained economic relevance in the production of home furnishings and industrial uses. The PP fiber is one of three primary synthetic fibers and is generally made via melt spinning. In the vertical or horizontal screw extruder, the acrylic resin is heated and molten, then extruded by nozzle through the metering pump and cooled to fiber in the air [27,28].

Experiment Plan
In this study, a melt spinning machine was used as the research machine, and PP was used as the material. The melt spinning machine uses heating to make the material appear in a molten state. The material was then conveyed through the screw and the gear pump, so that it is continuously extruded through the spinning nozzle, and wound into a drum through the roller. Since the melt spinning machine includes feeder, screw heating zone, gear pump, spinning nozzle, and take up system, as shown in Figure 4, the screw temperature, gear pump temperature, die head temperature, screw speed, gear pump speed, and take-up speed are designed as process parameters to discuss the quality of the fiber process. At the same time, a group of neural networks was trained using historical experimental data to predict the multiple characteristics of quality on the basis of various processing parameter values.

Experiment Plan
In this study, a melt spinning machine was used as the research machine, and PP was used as the material. The melt spinning machine uses heating to make the material appear in a molten state. The material was then conveyed through the screw and the gear pump, so that it is continuously extruded through the spinning nozzle, and wound into a drum through the roller. Since the melt spinning machine includes feeder, screw heating zone, gear pump, spinning nozzle, and take up system, as shown in Figure 4, the screw temperature, gear pump temperature, die head temperature, screw speed, gear pump speed, and take-up speed are designed as process parameters to discuss the quality of the fiber process. At the same time, a group of neural networks was trained using historical experimental data to predict the multiple characteristics of quality on the basis of various processing parameter values. Half the quality characteristics obtained were used as input feature values for the training of a quality abnormality classifier. After training, the other half were input as test samples to confirm whether the classifier could identify processing parameters responsi- Half the quality characteristics obtained were used as input feature values for the training of a quality abnormality classifier. After training, the other half were input as test samples to confirm whether the classifier could identify processing parameters responsible for the quality abnormality. Assuming the identification was successful, the diagnosis system for the melt spinning machine was complete.

Materials Analysis
PP was selected as the experimental material in the study because of its characteristics, namely, easy processing, mechanical strength, strong elasticity, resistance to staining, lightness, and low price. Before the experiment, it was necessary to find its melting point and thermal cracking point to plan settings for the machine. To learn the temperature of the thermal cracking point, a thermogravimetric analyzer was used, with a thermal differential analyzer used to measure its melting point. A thermogravimetric analysis diagram and differential scanning calorimetry (DSC) diagram are shown in Figures 5 and 6. It can be seen that the thermal cracking point of PP is about 400 • C, so this temperature was not exceeded during the experiment, as it risked contaminating the machine. The melting point is about 166 • C. Therefore, the temperature was kept above this level during the experiment. Temperatures lower than this temperature also risked damaging the machine.

Experimental Data
A total of 440 historical melt spinning measurement records were used as data in the study. There were six processing parameters, namely, screw temperature, gear pump temperature, die head temperature, screw speed, gear pump speed, and take-up speed. The

Experimental Data
A total of 440 historical melt spinning measurement records were used as data in the study. There were six processing parameters, namely, screw temperature, gear pump temperature, die head temperature, screw speed, gear pump speed, and take-up speed. The

Experimental Data
A total of 440 historical melt spinning measurement records were used as data in the study. There were six processing parameters, namely, screw temperature, gear pump temperature, die head temperature, screw speed, gear pump speed, and take-up speed. The corresponding quality characteristics were fineness, breaking strength, elongation at break, and modulus of resilience. These 440 samples were randomly split into a 330-item training data set and a 110-item validation data set, using the k-fold cross-validation method to produce the best model for subsequent analysis when the model was finally optimized.

Data Processing
Due to the different units of measurement of all the input independent variables, the output dependent variables, and the different value size characteristics, comparability was impaired. To solve this problem, the data was first normalized for this experiment. The range of processing parameters of the original independent variables is shown in Table 1.

Neural Network Training
A neural network was used for the prediction of the multiple quality characteristics. The input variables were the processing parameters of the melt spinning machine: screw temperature, gear pump temperature, die head temperature, screw speed, gear pump speed, and take-up speed. The output variables were the corresponding quality characteristics: fineness, breaking strength, elongation at break, and modulus of resilience.
The architecture of the neural network is shown in Figure 7. The number of hidden layers and the number of neurons in each layer of the neural network were variables used to find the best results using the grid search method.
Polymers 2022, 14, x FOR PEER REVIEW 9 of 24 corresponding quality characteristics were fineness, breaking strength, elongation at break, and modulus of resilience. These 440 samples were randomly split into a 330-item training data set and a 110-item validation data set, using the k-fold cross-validation method to produce the best model for subsequent analysis when the model was finally optimized.

Data Processing
Due to the different units of measurement of all the input independent variables, the output dependent variables, and the different value size characteristics, comparability was impaired. To solve this problem, the data was first normalized for this experiment. The range of processing parameters of the original independent variables is shown in Table 1.

.3. Neural Network Training
A neural network was used for the prediction of the multiple quality characteristics. The input variables were the processing parameters of the melt spinning machine: screw temperature, gear pump temperature, die head temperature, screw speed, gear pump speed, and take-up speed. The output variables were the corresponding quality characteristics: fineness, breaking strength, elongation at break, and modulus of resilience.
The architecture of the neural network is shown in Figure 7. The number of hidden layers and the number of neurons in each layer of the neural network were variables used to find the best results using the grid search method. In order to avoid the problem of over-fitting in neural networks, this study added the dropout method between hidden layers for regularization. However, the experimental results showed that the use of dropout regularization in the neural network only slightly improved the prediction of values. In order to normalize output values in the range 0 to 1, the output layer adopted the sigmoid function as shown in Equation (8). The activation function of the remaining layers was different from the commonly-used ReLU activation In order to avoid the problem of over-fitting in neural networks, this study added the dropout method between hidden layers for regularization. However, the experimental results showed that the use of dropout regularization in the neural network only slightly improved the prediction of values. In order to normalize output values in the range 0 to 1, the output layer adopted the sigmoid function as shown in Equation (8). The activation function of the remaining layers was different from the commonly-used ReLU activation function [29]. The novel Mish function was used instead [30], as shown in Equation (9). The experimental process is shown in Figure 8.
Polymers 2022, 14, x FOR PEER REVIEW 10 of 24 function [29]. The novel Mish function was used instead [30], as shown in Equation (9). The experimental process is shown in Figure 8.
As can be seen from Figure 8, the Mish function behaves differently than the commonly-used ReLU and sigmoid functions [31]. It can effectively and quickly converge the loss, and it is less prone to the problem of network weakening when the loss increases in a long iteration.

Evaluation Criteria and Training Results
In order to evaluate the performance of the neural network, the study adopted the commonly-used statistical standards, mean absolute error (MAE) and root mean squared error (RMSE), as evaluation methods. The formulas are shown in Equations (10) and (11). The lower the value, the better the performance of the neural network.  (11) where N represents the quality characteristic quantity, and ̂ and represent the predicted value and the actual value, respectively. As can be seen from Figure 8, the Mish function behaves differently than the commonlyused ReLU and sigmoid functions [31]. It can effectively and quickly converge the loss, and it is less prone to the problem of network weakening when the loss increases in a long iteration.

Evaluation Criteria and Training Results
In order to evaluate the performance of the neural network, the study adopted the commonly-used statistical standards, mean absolute error (MAE) and root mean squared error (RMSE), as evaluation methods. The formulas are shown in Equations (10) and (11). The lower the value, the better the performance of the neural network.
where N represents the quality characteristic quantity, andŷ i and y i represent the predicted value and the actual value, respectively. The mean absolute error and root mean square error of the training results for the validation data sets with neural network grid search are shown in Tables 2 and 3, respectively. It is observed from Tables 2 and 3 that the mean absolute error and the root mean square error can be reduced by increasing the number of neurons and the hidden layers. However, if the number of these two is increased too much, it will lead to overfitting, so that the error in the validation dataset increases. After using the grid search method [32] to obtain the optimal number of hidden layers and neurons in the neural network, we added dropout regularization between the hidden layers in the neural network to reduce the problem of overfitting, as shown in Figure 9.
The mean absolute error and root mean square error of the training results for th validation data sets with neural network grid search are shown in Tables 2 and 3, respec tively. It is observed from Tables 2 and 3 that the mean absolute error and the root mean square error can be reduced by increasing the number of neurons and the hidden layers However, if the number of these two is increased too much, it will lead to overfitting, so that the error in the validation dataset increases. After using the grid search method [32 to obtain the optimal number of hidden layers and neurons in the neural network, w added dropout regularization between the hidden layers in the neural network to reduc the problem of overfitting, as shown in Figure 9. Despite trying to reduce overfitting in the neural network through the use of dropout regularization, the experimental results showed it only slightly reduced errors in predicted numerical values in the validation data set. Then, we used three algorithms to improve gradient descent for the final optimization of the neural network, namely SGDM [33], RMSProp [34], and Adam [35]. In order to solve the common problem of the basic gradient descent falling into a local optimal solution and not being able to escape, we made use of the SGDM gradient descent method with momentum.
The above equation is the SGDM gradient descent formula with momentum. Compared with the basic SGDM gradient descent method, directional velocity V and momentum β are added.
Starting by testing with a smaller number of iterations, as shown in Figure 10, we found that the RMSProp algorithm reduced error rapidly at the beginning, but its performance declined after a larger number of iterations. The Adam algorithm and the SGDM algorithm, on the other hand, behaved similarly and converged effectively on a smaller error rate. Therefore, the number of iterations was extended in the experiment, and only the performance of the Adam algorithm and the SGDM algorithm was compared. The results are shown in Figure 11. Despite trying to reduce overfitting in the neural network through the use of d regularization, the experimental results showed it only slightly reduced errors dicted numerical values in the validation data set. Then, we used three algorithm prove gradient descent for the final optimization of the neural network, namely [33], RMSProp [34], and Adam [35]. In order to solve the common problem of th gradient descent falling into a local optimal solution and not being able to esca made use of the SGDM gradient descent method with momentum.
The above equation is the SGDM gradient descent formula with momentum pared with the basic SGDM gradient descent method, directional velocity V and m tum β are added.
Starting by testing with a smaller number of iterations, as shown in Figure  found that the RMSProp algorithm reduced error rapidly at the beginning, but its mance declined after a larger number of iterations. The Adam algorithm and the algorithm, on the other hand, behaved similarly and converged effectively on a error rate. Therefore, the number of iterations was extended in the experiment, an the performance of the Adam algorithm and the SGDM algorithm was compar results are shown in Figure 11.  After 1000 iterations, the Adam algorithm and the SGDM algorithm had quite training curves, with mean absolute errors of 0.0713 and 0.0709, and root mean errors of 0.0917 and 0.0905, respectively. Although there was not much difference b the two, the SGDM algorithm reduced the loss slightly more effectively than the algorithm. The detailed optimal training process is shown in Table 4. Finally, the quality characteristic values predicted by the neural network are in Table 5 as a figure between 0 and 1 calculated by the sigmoid function. The resul that the neural network effectively predicted the multiple quality characteristic val particular processing parameter settings would produce. Not only could expected be predicted before an experimental run, but the search for optimization paramete also be further carried out.  After 1000 iterations, the Adam algorithm and the SGDM algorithm had quite similar training curves, with mean absolute errors of 0.0713 and 0.0709, and root mean square errors of 0.0917 and 0.0905, respectively. Although there was not much difference between the two, the SGDM algorithm reduced the loss slightly more effectively than the Adam algorithm. The detailed optimal training process is shown in Table 4. Finally, the quality characteristic values predicted by the neural network are shown in Table 5 as a figure between 0 and 1 calculated by the sigmoid function. The results show that the neural network effectively predicted the multiple quality characteristic values that particular processing parameter settings would produce. Not only could expected results be predicted before an experimental run, but the search for optimization parameters could also be further carried out. As can be seen from Figures 12-15, the neural network model could successfully and effectively predict the effect of various combinations of processing parameters on the corresponding fineness, breaking strength, elongation at break, and modulus of resilience quality characteristics. As can be seen from Figures 12-15, the neural network model could successfully and effectively predict the effect of various combinations of processing parameters on the corresponding fineness, breaking strength, elongation at break, and modulus of resilience quality characteristics.      Compared with the traditional Taguchi analysis method, which requires the setting up of orthogonal arrays, carrying out main effect analysis, analysis of variance, confirmatory tests, and undertaking other time-consuming steps [5], the neural network model conducts self-training and learning with past historical data, meaning it can analyze the data more efficiently.
It was the aim of this study to obtain optimal processing parameters for minimum fineness and maximum breaking strength, elongation at break, and modulus of resilience. Therefore, because of the high speed at which deep learning neural networks could be calculated (one data prediction does not require one millisecond), the grid search method was used to exhaustively find the best combination of processing parameters with a mean square error minimum for fineness and a maximum one for breaking strength, elongation at break and modulus of resilience. The result of the search was that when the screw temperature is 180 °C, the gear pump temperature is 220 °C, the die head temperature is 240 °C, the screw speed is 7.5 rpm, the gear pump speed is 15 rpm, and the take-up speed is 700 rpm, the output quality characteristic is that the fineness is 243 Denier (D), the breaking strength is 3.4 N/mm 2 , the breaking elongation is 643%, and the elastic energy modulus is 9.13 N/mm 2 , as shown in Table 6.

Creating Historical Data and Abnormal Samples
Having obtained the best parameters with the neural network grid search method, this set of processing parameters was used as experimental parameters to produce one data set made up of 20 normal samples, as shown in Table 7. Then, to produce two more abnormal experimental data sets, for the first with one abnormal parameter setting and Compared with the traditional Taguchi analysis method, which requires the setting up of orthogonal arrays, carrying out main effect analysis, analysis of variance, confirmatory tests, and undertaking other time-consuming steps [5], the neural network model conducts self-training and learning with past historical data, meaning it can analyze the data more efficiently.
It was the aim of this study to obtain optimal processing parameters for minimum fineness and maximum breaking strength, elongation at break, and modulus of resilience. Therefore, because of the high speed at which deep learning neural networks could be calculated (one data prediction does not require one millisecond), the grid search method was used to exhaustively find the best combination of processing parameters with a mean square error minimum for fineness and a maximum one for breaking strength, elongation at break and modulus of resilience. The result of the search was that when the screw temperature is 180 • C, the gear pump temperature is 220 • C, the die head temperature is 240 • C, the screw speed is 7.5 rpm, the gear pump speed is 15 rpm, and the take-up speed is 700 rpm, the output quality characteristic is that the fineness is 243 Denier (D), the breaking strength is 3.4 N/mm 2 , the breaking elongation is 643%, and the elastic energy modulus is 9.13 N/mm 2 , as shown in Table 6.

Creating Historical Data and Abnormal Samples
Having obtained the best parameters with the neural network grid search method, this set of processing parameters was used as experimental parameters to produce one data set made up of 20 normal samples, as shown in Table 7. Then, to produce two more abnormal experimental data sets, for the first with one abnormal parameter setting and the second with two abnormal settings, one or two processing parameters were changed in sequence, as shown in Table 8. Therefore, for the first single-factor abnormal samples for the A setting there, the value was Abnormal 1 and for the second samples it was Abnormal 2, with the remaining processing parameters staying the same. For two-factor abnormal samples, two processing parameters were changed at a time, with the others staying the same. Therefore, for ones where the A and B settings were changed, at the same time, they were Abnormal 1 and then they were Abnormal 2. The remaining processing parameters were changed according to this same rule.  In this study, for both normal and abnormal data sets, the generation of 20 samples was taken as a standard, so a total of 20 samples were produced for the best parameters, There were twenty samples for each single-factor abnormal processing parameter (both Abnormal 1 and Abnormal 2), and 20 samples for each two-factor abnormal sample pairing (both Abnormal 1 and Abnormal 2). A total of 20 normal samples, 120 (=6 × 20) single-factor abnormal samples, and 300 (=15 × 20) two-factor abnormal samples were obtained.

Abnormal Processing Parameter Classifier Model Training
In order to determine which processing parameters cause quality characteristic abnormalities, and to improve the process yield of the melt spinning machine, the 420 samples of the abnormal data sets and 20 of the normal data set were applied to train an artificial intelligence classifier, using the neural network to generate quality characteristic predictions.
Input feature x i , as shown in Equation (14), was the differences between the four quality characteristics y i (fineness, breaking strength, elongation at break, elastic energy modulus) actually obtained in the abnormal sample data and the correspondingŷ i predictions of the neural network quantity, as shown in Table 9. The output was one-hot encoded classification results for the processing parameters of screw temperature, gear pump temperature, die head temperature, screw speed, gear pump speed, and take-up speed, as shown in Table 10.  Table 10. Classification results for the corresponding abnormal samples of Table 9. When training the classifier, if all the data are directly classified as the result of one or two factors, errors are likely because some values are too close to allow abnormal processing parameter settings to be correctly judged. In order to improve the accuracy of the classifier, it was necessary to divide the abnormal samples into one-factor or two-factor groups first, and then treat them separately. Therefore, this study needed to use a total of three classifiers to make predictions. When the classifier was classified as single or double, the output was a single or two-factor abnormality; when a single-factor classification was used, the output was the corresponding abnormal processing parameter type; when a twofactor classification was used, the output was the corresponding two abnormal processing parameter types.

Sets
Because relevant studies in the literature have achieved outstanding performance with a number of different classifiers, this study compared these commonly-used machine learning methods with the results found here. Classification methods such as decision trees, random forests, support vector machines, and neural network methods such as deep learning are all possible bases for comparison. Since each of these classifier methods have many hyper parameters to be adjusted, the grid search method was used to find the best parameter combination of each method. A random forest architecture diagram is shown in Figure 16.
or two factors, errors are likely because some values are too close to allow abnormal cessing parameter settings to be correctly judged. In order to improve the accuracy o classifier, it was necessary to divide the abnormal samples into one-factor or two-fa groups first, and then treat them separately. Therefore, this study needed to use a tot three classifiers to make predictions. When the classifier was classified as single or dou the output was a single or two-factor abnormality; when a single-factor classification used, the output was the corresponding abnormal processing parameter type; wh two-factor classification was used, the output was the corresponding two abnormal cessing parameter types.
Because relevant studies in the literature have achieved outstanding performa with a number of different classifiers, this study compared these commonly-used mac learning methods with the results found here. Classification methods such as deci trees, random forests, support vector machines, and neural network methods such as d learning are all possible bases for comparison. Since each of these classifier methods h many hyper parameters to be adjusted, the grid search method was used to find the parameter combination of each method. A random forest architecture diagram is sh in Figure 16.

Single and Double Identification
For determining the difference for each sample between the actual four quality c acteristics of the abnormal sample data, and the four quality characteristics predicted the neural network quantity, an array of 4 values was used as the input feature of classifier. There were 420 abnormal samples and 20 normal samples in the data set. data set was randomly divided into a 220-item training data set and a 220-item verifica data set. First, the training data set was used to construct a model, and then the verifica data set was used for testing. The purpose of the verification data set was to determ whether the melt spinning abnormal diagnosis system could adequately detect proces parameters responsible for abnormalities or not. Training also used grid search and cr validation to find the best settings for each classifier. Evaluation of the classifier used detection success rate, which is the ratio of the number of samples correctly identifie the total number of samples in all the validation data sets, as shown in Equation (15). larger the value, the better. The detection success rates of various classification meth are shown in Table 11. The success of the random forest classification method can be in the confusion matrix in Figure 17, where 0 means no abnormality, 1 means a si abnormality, and 2 means a double abnormality. In the validation data set, the met was the best at identifying single and double abnormalities, and there were no lac abnormalities, single abnormalities, and double abnormalities in the 220 samples whi

Single and Double Identification
For determining the difference for each sample between the actual four quality characteristics of the abnormal sample data, and the four quality characteristics predicted by the neural network quantity, an array of 4 values was used as the input feature of the classifier. There were 420 abnormal samples and 20 normal samples in the data set. The data set was randomly divided into a 220-item training data set and a 220-item verification data set. First, the training data set was used to construct a model, and then the verification data set was used for testing. The purpose of the verification data set was to determine whether the melt spinning abnormal diagnosis system could adequately detect processing parameters responsible for abnormalities or not. Training also used grid search and cross-validation to find the best settings for each classifier. Evaluation of the classifier used the detection success rate, which is the ratio of the number of samples correctly identified to the total number of samples in all the validation data sets, as shown in Equation (15). The larger the value, the better. The detection success rates of various classification methods are shown in Table 11. The success of the random forest classification method can be seen in the confusion matrix in Figure 17, where 0 means no abnormality, 1 means a single abnormality, and 2 means a double abnormality. In the validation data set, the method was the best at identifying single and double abnormalities, and there were no lack of abnormalities, single abnormalities, and double abnormalities in the 220 samples which it missed. All predictions were correct. As for grid search hyperparameters, a random forest with eight decision trees and a maximum depth of four worked best.
Detection success rate = Correct number of samples to be tested All sample numbers (15)  missed. All predictions were correct. As for grid search hyperparameters, a random forest with eight decision trees and a maximum depth of four worked best.
Correct number of samples to be tested All sample numbe Detection success rate= rs (15)

One-Factor Classification
Single-factor abnormal discrimination can be carried out after single-double identification is performed and has indicated there is one abnormal parameter setting. In development of the study's classifier, there were a total of 120 samples, 60 were randomly selected for training, and the other 60 formed the verification data set. The results for singlefactor classification, shown in Table 12, were similar to the single-double identification ones, and the use of random forest classification was again the best. From the confusion matrix in Figure 18, where 0 to 5 represent screw temperature, gear pump temperature, die head temperature, screw speed, gear pump speed, and take-up speed, respectively, it can be seen that only one gear pump speed abnormality in the 60-item verification data set was misjudged as an abnormal screw temperature, so the detection success rate was as high as 98.3%. Identification by the random forest classifier of which processing parameter caused the abnormality worked very well. As for grid search hyperparameters, the random forest with twelve decision trees and a maximum depth of four performed best.

One-Factor Classification
Single-factor abnormal discrimination can be carried out after single-double identification is performed and has indicated there is one abnormal parameter setting. In development of the study's classifier, there were a total of 120 samples, 60 were randomly selected for training, and the other 60 formed the verification data set. The results for singlefactor classification, shown in Table 12, were similar to the single-double identification ones, and the use of random forest classification was again the best. From the confusion matrix in Figure 18, where 0 to 5 represent screw temperature, gear pump temperature, die head temperature, screw speed, gear pump speed, and take-up speed, respectively, it can be seen that only one gear pump speed abnormality in the 60-item verification data set was misjudged as an abnormal screw temperature, so the detection success rate was as high as 98.3%. Identification by the random forest classifier of which processing parameter caused the abnormality worked very well. As for grid search hyperparameters, the random forest with twelve decision trees and a maximum depth of four performed best.

Two-Factor Classification
Finally, a two-factor abnormality classifier was developed with a total of 30 150 randomly selected for the training data set and 150 for the verification da experimental results are shown in Table 13. It can be seen that the random fore performed much better than the decision tree and neural network ones. Its con trix is shown in Figure 19, with 0 to 4 representing the combination of screw te and the remaining 5 abnormal processing parameters, 5 to 8 representing the co of gear pump temperature and the remaining 4 abnormal processing paramet representing the combination of die temperature and the other three abnormal parameters, 12 and 13 representing the combination of the screw speed and th abnormal processing parameters, and 14 representing the abnormal processing combination of the gear pump speed and the take-up speed. It can be observe 6 misjudgments were made of verification data set samples, and the detection s was as high as 96.0%. These 6 misjudged samples were made up of a combina ferent abnormal processing parameters, with no parameter appearing in mor sample, indicating that model overfitting is not a problem. As for the grid searc rameters, the random forest with twenty-four decision trees and a maximum d worked best.

Two-Factor Classification
Finally, a two-factor abnormality classifier was developed with a total of 300 samples, 150 randomly selected for the training data set and 150 for the verification data set. The experimental results are shown in Table 13. It can be seen that the random forest classifier performed much better than the decision tree and neural network ones. Its confusion matrix is shown in Figure 19, with 0 to 4 representing the combination of screw temperature and the remaining 5 abnormal processing parameters, 5 to 8 representing the combination of gear pump temperature and the remaining 4 abnormal processing parameters, 9 to 11 representing the combination of die temperature and the other three abnormal processing parameters, 12 and 13 representing the combination of the screw speed and the other two abnormal processing parameters, and 14 representing the abnormal processing parameter combination of the gear pump speed and the take-up speed. It can be observed that only 6 misjudgments were made of verification data set samples, and the detection success rate was as high as 96.0%. These 6 misjudged samples were made up of a combination of different abnormal processing parameters, with no parameter appearing in more than one sample, indicating that model overfitting is not a problem. As for the grid search hyperparameters, the random forest with twenty-four decision trees and a maximum depth of six worked best. It can be concluded that the random forest achieved the best accuracy rate, with results similar to those in the literature. With comparatively few data samples, it also exhibited greater anti-overfitting resistance. This study's random forest classifier was compared with the use of the decision tree classifier and the four values obtained by the RAM method as input features in their ability to identify abnormal processing parameters. As can be seen from the final overall accuracy rate in Table 14, the results for both are outstanding. However, using the deep learning neural network and random forest classifier of this study to do abnormal processing parameter detection avoids the need to compute Hotelling's T 2 for abnormal product detection first, by directly classifying the data on the basis of the results obtained from the deep learning neural network. To use the decision tree with the RAM method too much process calculation is required [36,37]. For example, supposing there is some bias in the calculation of Hotelling's T 2 at the beginning. This will lead to indirect errors in the RAM method and the feature input of the decision tree, resulting in misjudgment of the final detection result. In addition, the calculation time required for the final abnormal processing parameter detection in this study is only 0.08 s, meaning it is more efficient in comparison.
Polymers 2022, 14, x FOR PEER REVIEW Figure 19. Confusion matrix of random forest two-factor classification results.
It can be concluded that the random forest achieved the best accuracy ra sults similar to those in the literature. With comparatively few data samples, it ited greater anti-overfitting resistance. This study's random forest classifier was with the use of the decision tree classifier and the four values obtained by method as input features in their ability to identify abnormal processing para can be seen from the final overall accuracy rate in Table 14, the results for bo standing. However, using the deep learning neural network and random fore of this study to do abnormal processing parameter detection avoids the need t Hotelling's T 2 for abnormal product detection first, by directly classifying the basis of the results obtained from the deep learning neural network. To use th tree with the RAM method too much process calculation is required [36,37]. Fo supposing there is some bias in the calculation of Hotelling's T 2 at the beginnin lead to indirect errors in the RAM method and the feature input of the decisi sulting in misjudgment of the final detection result. In addition, the calculati quired for the final abnormal processing parameter detection in this study is meaning it is more efficient in comparison.

Conclusions
This study applied a deep learning neural network and random forest from machine learning in artificial intelligence to the optimal quality prediction of multiple quality parameters and quality abnormality diagnosis of melt spinning machines. It included six processing parameters and four qualities. The conclusions are as follows.
(1) The deep learning neural network is used for experiments, 440 pieces of historical data are trained, and multiple quality optimization parameters are searched by using the characteristic grid of deep learning rapid calculation. The deep learning neural network was used to generate quality predictions, trained on a 440-item historical data set, and multiple quality optimization parameters were searched for using rapid deep-learning characteristic grid calculations. Compared with the traditional Taguchi analysis method, the neural network model conducts self-training and learning using past historical data, which means the research can proceed faster, analysis is more efficient, and conclusions are more robust, because a calculation error in one step will not affect the overall detection system.
(2) This research compared several artificial intelligence machines learning and deep learning classifiers that have obtained outstanding results in the related literature, and finally selected the random forest as being the best, because its classifier belongs to ensemble learning, and the classifier is resistant to overfitting. Its ability to detect the cause of quality problems was better than that of other classifiers. As an indication the success rate of single and double identification was 100%, the success rate of single factor classification was 98.3%, and the success rate of double factor classification was 96.0%. It can be seen that the proposed method offers an effective way to identify the problematic machine settings, causing problems in quality control after the engineer has measurements of the abnormality so that the settings can be quickly modified to improve production yield. (3) This study applied the methods of artificial intelligence to the development of an abnormal processing PP fiber melt spinning parameter identification system which can quickly find abnormal settings and reduce unnecessary cost and waste. In the future, different online detection systems matching the capabilities of this system for various other kinds of material will be added to the resources available to production engineers seeking to apply the developed identification system for its functions of selection and evaluation.