Hybrid NHPSO-JTVAC-SVM Model to Predict Production Lead Time

: In the shipbuilding industry, each production process has a respective lead time; that is, the duration between start and ﬁnish times. Lead time is necessary for high-efﬁciency production planning and systematic production management. Therefore, lead time must be accurate. However, the traditional method of lead time management is not scientiﬁc because it only references past records. This paper proposes a new self-organizing hierarchical particle swarm algorithm (PSO) with jumping time-varying acceleration coefﬁcients (NHPSO-JTVAC)-support vector machine (SVM) regression model to increase the accuracy of lead-time prediction by combining the advanced PSO and SVM models. Moreover, this paper compares the prediction results of each SVM-based model with those of other conventional machine-learning algorithms. The results demonstrate that the proposed NHPSO-JTVAC-SVM model can achieve further meaningful enhancements in terms of prediction accuracy. The prediction performance of the NHPSO-JTVAC-SVM model is also better than that of the other SVM-based models or other machine learning algorithms. Overall, the NHPSO–JTVAC-SVM model is feasible for predicting the lead time in shipbuilding.


Introduction
In the shipbuilding industry, an essential part of scientific management is lead time, which is necessary for shipyards to arrange production plans, particularly in production organization and progress of production control [1,2].Additionally, lead time is closely related to the production efficiency of frontline manufacturing workers.The rationality of its arrangement directly affects workers' production enthusiasm, thereby affecting product quality and labor productivity [3].For example, if the evaluation of lead time is insufficient and management has not prepared the appropriate plan for construction peaks, the construction cycle will be prolonged.Here, to avoid affecting the construction cycle, workers must operate overtime for long periods, resulting in a decline in construction quality.Conversely, if the evaluation of lead time is overestimated, it results in excess construction capacity problems and human resource waste [4].Consequently, lead time should be arranged reasonably, which means that the planned lead time must be as close as possible to the actual lead time in the shipyards' production planning stage.However, because the shipbuilding industry is a labor-intensive industry, this has resulted in a significant difference between planned and actual lead times (Figure 1).Therefore, to rationalize the lead time arrangement, lead-time prediction becomes particularly critical [2,4,5].
For many years, managers frequently used the experience evaluation method to specify lead time [6].However, this method is time-consuming and inefficient.Thus, production planning and scheduling (PPS) cannot be properly organized, making shipyard management ineffective [7].Lead time is affected by various factors and restricted by various conditions [8].Several researchers have studied the lead time of production, and some results have been achieved [7,9,10].
Appl.Sci.2021, 11, x FOR PEER REVIEW 2 of 16 various conditions [8].Several researchers have studied the lead time of production, and some results have been achieved [7,9,10].In recent research, machine learning (ML) has been widely applied in the prediction of production lead time to understand the complex relationship between lead time and its affecting factors.Gyulai et al. [9] proposed using ML algorithms to predict the lead time of jobs in the manufacturing flow-shop environment; the results indicated that ML algorithms can sufficiently understand the non-linear relationship, and they obtained good prediction accuracy from ML models.Lingitz et al. [7] analyzed the key features of lead time, provided importance scores, and developed ML models to predict lead time in the semiconductor manufacturing industry.In particular, Jeong et al. [10] attempted to improve production management capabilities by analyzing the lead time based on spool fabrication and painting datasets.They applied ML algorithms and compared the performance of each.
In ML, the SVM algorithm, proposed by Vapnik [11] in 1995, is widely used in the prediction field.Because it is based on statistical learning theory and the principle of structural risk minimization, an SVM can theoretically converge on the global optimal solution of a problem.Moreover, it exhibits unique advantages in solving small samples and nonlinear problems.It has strong generalization ability and has become a popular research topic in the field of industrial forecasting.Thissen et al. [12] applied an SVM model to predict time series.They demonstrated that the SVM model performs well in time-series forecasting.Zhang et al. [13] proposed using an SVM model to forecast the short-term load of an electric power system.They demonstrated that the forecast performance of the SVM model was better than that of a back-propagation neural network (BPNN).Astudillo et al. [14] used an SVM model to predict copper prices.The results indicated that the SVM model can predict copper-price volatilities near reality.
However, the disadvantage of an SVM is that it is too sensitive to parameters, and an efficient SVM model can be built only after its parameters are carefully selected [15].Therefore, many researchers have proposed methods for optimizing SVM parameters (Table 1).For instance, Yu et al. [16] combined an SVM model with a PSO algorithm to predict man-hours in aircraft assembly.The forecasting results indicated that the PSO-SVM model was significantly better than the BPNN model.Wan et al. [17] suggested applying the PSO-SVM hybrid model to predict the risk of the expressway project.The prediction results showed that the proposed model was more accurate and better than the traditional SVM model.Lv et al. [18] used PSO-SVM, grid search (GS)-SVM models to predict steel corrosion.Compared with the GS-SVM model, the results showed that the PSO-SVM steel corrosion prediction model was more accurate.Additionally, Luo et al. [19] proposed the In recent research, machine learning (ML) has been widely applied in the prediction of production lead time to understand the complex relationship between lead time and its affecting factors.Gyulai et al. [9] proposed using ML algorithms to predict the lead time of jobs in the manufacturing flow-shop environment; the results indicated that ML algorithms can sufficiently understand the non-linear relationship, and they obtained good prediction accuracy from ML models.Lingitz et al. [7] analyzed the key features of lead time, provided importance scores, and developed ML models to predict lead time in the semiconductor manufacturing industry.In particular, Jeong et al. [10] attempted to improve production management capabilities by analyzing the lead time based on spool fabrication and painting datasets.They applied ML algorithms and compared the performance of each.
In ML, the SVM algorithm, proposed by Vapnik [11] in 1995, is widely used in the prediction field.Because it is based on statistical learning theory and the principle of structural risk minimization, an SVM can theoretically converge on the global optimal solution of a problem.Moreover, it exhibits unique advantages in solving small samples and nonlinear problems.It has strong generalization ability and has become a popular research topic in the field of industrial forecasting.Thissen et al. [12] applied an SVM model to predict time series.They demonstrated that the SVM model performs well in time-series forecasting.Zhang et al. [13] proposed using an SVM model to forecast the short-term load of an electric power system.They demonstrated that the forecast performance of the SVM model was better than that of a back-propagation neural network (BPNN).Astudillo et al. [14] used an SVM model to predict copper prices.The results indicated that the SVM model can predict copper-price volatilities near reality.
However, the disadvantage of an SVM is that it is too sensitive to parameters, and an efficient SVM model can be built only after its parameters are carefully selected [15].Therefore, many researchers have proposed methods for optimizing SVM parameters (Table 1).For instance, Yu et al. [16] combined an SVM model with a PSO algorithm to predict man-hours in aircraft assembly.The forecasting results indicated that the PSO-SVM model was significantly better than the BPNN model.Wan et al. [17] suggested applying the PSO-SVM hybrid model to predict the risk of the expressway project.The prediction results showed that the proposed model was more accurate and better than the traditional SVM model.Lv et al. [18] used PSO-SVM, grid search (GS)-SVM models to predict steel corrosion.Compared with the GS-SVM model, the results showed that the PSO-SVM steel corrosion prediction model was more accurate.Additionally, Luo et al. [19] proposed the use of a genetic algorithm (GA) to optimize an SVM model.The overall results indicated that the GA is an excellent optimization algorithm for increasing the prediction accuracy of an SVM.
In the landslide groundwater levels prediction field, the GA-SVM model was proposed by Cao et al. [20].The results showed that the GA-SVM model can understand the relationship between groundwater level fluctuations and influencing factors well.Moreover, other researchers combined other meta-heuristic algorithms such as the bat algorithm (BA) and the grasshopper optimization algorithm (GOA) with SVMs, and obtained good results [21,22].Unlike other studies, this paper proposes the application of a new self-organizing hierarchical PSO with jumping time-varying acceleration coefficients (NHPSO-JTVAC) algorithm, an advanced version of the PSO algorithm, to optimize the parameters in an SVM.Moreover, the NHPSO-JTVAC-SVM model is proposed to predict the lead time in the shipyard's block assembly and pre-outfitting processes.The remainder of this paper is organized as follows: Section 2 introduces the algorithm principle of the hybrid predictive model.Section 3 describes the construction of the model.Section 4 discusses the experimental results, and Section 5 summarizes and concludes the paper.

SVM
For non-linear regression problems, assume the training data {(x 1 , y 1 ), (x 2 , y 2 ), where n is the total number of training samples.The regression concept of SVM is to determine a non-linear mapping from the input to output and map the data to a high-dimensional feature space, in which the training samples can be regressed through a regression equation f (x), where f (x) can be expressed as the following equation [23]: where w represents the weight vector, and b represents the bias vector.
The SVM problem can be described as solving the following problem [24,25]: subject to: where C is the penalty parameter; ξ i and ξ * i are the slack variables; ε is defined as the tube width; the ε-insensitive loss function that controls the regression error is defined by the following formula [26]: Next, the SVM problem can be transformed into a dual-optimization problem [27]: subject to: Finally, the SVM regression function can be obtained from the following equation [28]: where K(x i , x) is the kernel function of the SVM model.According to experience, when solving complex high-dimensional sample problems, the radial basis function (RBF, Equation ( 9)) kernel is better than other kernel functions.Therefore, the RBF is used as the kernel function in this study [29].
As shown above, the three most important parameters (C, ε, σ) of the SVM nonlinear regression function must be determined by the user.Selecting appropriate values is challenging.To solve this problem, the optimization algorithm is described below.

NHPSO-JTVAC: An Advanced Version of PSO 2.2.1. PSO Algorithm
In 1995, inspired by the flocking behavior of birds, the PSO algorithm was introduced and developed by Kennedy and Eberhart [30].This is a search algorithm used to solve optimization problems in computational mathematics.It is also one of the most classic swarm intelligent algorithms because of its fast convergence and simple implementation [31].
The PSO algorithm is conducted by first initializing a group of random particles and then determining the optimal solution through iteration.In each iteration, the particles track two extreme values to update themselves: the personal best (p-best) and global best (g-best) values.Each particle updates its speed and position according to the above two extreme values.If, in a D-dimensional target search space, the population number is m, where the position of the i-th particle in the d-th dimension is x i,d , its velocity can be defined as v i,d , the current p-best position of the particle is p i,d , and the current g-best position of the entire particle swarm is p g,d .Each particle's velocity and position are updated according to the following formulations [32]: r 1 and r 2 are random numbers following the uniform (0, 1) distribution; c 1 and c 2 are learning factors; ω is the inertia weight that controls the current velocity of the particle, and its value is non-negative.The larger the value of ω, the greater the particle's velocity, and the particle will perform a global search with a more significant step size; for smaller values of ω, the particle tends to perform a more finely local search.To balance global and local search capabilities, ω generally assumes a dynamic value.In addition, the linearly decreasing inertia weight (LDIW) strategy is most commonly used to determine the values of ω [33].
where Iter is the current number of iterations, Iter max is the maximum number of iterations, and ω max and ω min are maximal/minimal inertia weights, frequently set to 0.9 and 0.4, respectively [34].

NHPSO-JTVAC Algorithm
The PSO algorithm has a fast convergence speed, but it sometimes falls into a local optimum, and there is no guarantee that it can search for the optimal solution [34].
To solve the above problems, HPSO-TVAC was proposed by Ratnaweera et al. [35] as an efficient improved algorithm of the classic PSO algorithm.Ghasemi et al. [36] proposed an enhanced version of HPSO-TVAC called NHPSO-JTVAC, which has better performance than the original HPSO-TVAC algorithm.To avoid particles falling into local optima, they are afforded the ability to suddenly jump out during the algorithm iteration according to Equations ( 13)-( 15): c Iter c Iter where c Iter changes from c 1 = c i = 0.5 to c Iter max = c f = 0, and w is defined as a standard normal random value.Unlike Equation (10), the new search equation is given as: where p Iter r,d represent the best personal solution of a randomly selected particle (such as the r-th particle).

Applying NHPSO-JTVAC to SVM
To select the appropriate parameters for an SVM, the NHPSO-JTVAC algorithm proposed in Section 2.2.2 was applied to optimize the parameters of the SVM. Figure 2 illustrates the SVM flow chart based on the NHPSO-JTVAC algorithm.where  , represent the best personal solution of a randomly selected particle (such as the r-th particle).

Applying NHPSO-JTVAC to SVM
To select the appropriate parameters for an SVM, the NHPSO-JTVAC algorithm proposed in Section 2.2.2 was applied to optimize the parameters of the SVM. Figure 2

•
Preprocess the data, and then split the dataset randomly into a training and test set (8:2).

•
Randomly initialize the velocity and position of the particles, where the position vector (3-dimensional) represents the three parameters (C, ε, σ) of the SVM.

•
Calculate the fitness value of each particle and determine the current p-best and g-best positions.The fitness function selected in this study was the mean absolute percentage error (MAPE) function (Equation ( 17)). Figure 3 illustrates the concept of k-fold cross-validation (CV).To prevent the model from overfitting, a 5-fold CV method was adopted in this study [37].
where m is the number of training samples, y i is the actual value, and ŷi is the predicted value.17)).Figure 3 illustrates the concept of kfold cross-validation (CV).To prevent the model from overfitting, a 5-fold CV method was adopted in this study [37].
where  is the number of training samples,  is the actual value, and  is the predicted value.

•
For each particle, compare its fitness value with the p-best position it has experienced.If this is better, use it as the current p-best position.

•
For each particle, compare its fitness value with the g-best position.If this is better, replace its fitness value with the g-best.

•
Calculate and update the velocity and position of each particle.

•
If the termination condition is not satisfied, return (b); otherwise, the optimal solution is obtained, and the algorithm ends.

Data and Preparation
This paper presents a hybrid artificial intelligence (AI) model to predict the lead time in the shipyard block process.As shown in Table 2, we applied it to two datasets collected from a shipyard's block assembly and a pre-outfitting process to evaluate the proposed model.The assembly and pre-outfitting processes consisted of information from 4779 and

•
For each particle, compare its fitness value with the p-best position it has experienced.If this is better, use it as the current p-best position.

•
For each particle, compare its fitness value with the g-best position.If this is better, replace its fitness value with the g-best.

•
Calculate and update the velocity and position of each particle.

•
If the termination condition is not satisfied, return (b); otherwise, the optimal solution is obtained, and the algorithm ends.

Data and Preparation
This paper presents a hybrid artificial intelligence (AI) model to predict the lead time in the shipyard block process.As shown in Table 2, we applied it to two datasets collected from a shipyard's block assembly and a pre-outfitting process to evaluate the proposed model.The assembly and pre-outfitting processes consisted of information from 4779 and 4198 blocks.Each dataset was split into training and test data.Eighty percent of each dataset, 3823 and 3358 data points, was used to train data individually, and 20% of each dataset, 956 and 840 data points, was used to test data separately.The target value (label) of each dataset was the lead time.A part of the original dataset is shown in Figure 4a,b.

Data Normalization
To eliminate the effect of significant differences between the different scales on the learning speed, prediction accuracy, and generalizability of the SVM, we performed normalization preprocessing on the training and test samples, and the data were normalized to [0, 1].The normalization formula was as follows:

Data Normalization
To eliminate the effect of significant differences between the different scales on the learning speed, prediction accuracy, and generalizability of the SVM, we performed normalization preprocessing on the training and test samples, and the data were normalized to [0, 1].The normalization formula was as follows:

Feature Selection
Feature selection (FS) of the machine learning model is essential.FS avoids the data dimensions problem and reduces learning difficulty.We performed feature engineering and removed irrelevant features.
As shown in Figure 5, the FS steps were as follows: Appl.Sci.2021, 11, x FOR PEER REVIEW 8 of 16

Parameter Setting
As shown in Figure 6, for comparison with the proposed algorithm NHPSO-JTVAC, we also applied other meta-heuristic algorithms such as PSO, BA, GA, and GOA [38][39][40][41] and compared the performance of each algorithm with the others.
In all the algorithms, the population size was unified and set to 20, and the number of iterations set to 500.The search space dimension of each algorithm was set to 3, which represented the three parameters (, , ) of the SVM.The search range of  was set to [10 , 10 ],  was set to [10 , 10 ], and  was set to [10 , 10 ].The remaining parameters of the NHPSO-JTVAC were set as listed in Table 3.Furthermore, the initial parameters of PSO, GA, BA, and GOA were set as listed in Appendix A (Table A1).Step 1.The data are split into the training and test sets (8:2).
Step 2. The random forest (RF) model is trained using a training set.
Step 3. The importance score for each feature in the training set is calculated.Features are ranked by feature importance scores.Step 4. Suppose the model's accuracy and execution time are not satisfied.In that case, the feature with the minimum importance score will be deleted from the data set, and Steps 2 and 3 will be repeated until the desired number of features is obtained.Otherwise, the feature subset is obtained directly.

Parameter Setting
As shown in Figure 6, for comparison with the proposed algorithm NHPSO-JTVAC, we also applied other meta-heuristic algorithms such as PSO, BA, GA, and GOA [38][39][40][41] and compared the performance of each algorithm with the others.In all the algorithms, the population size was unified and set to 20, and the number of iterations set to 500.The search space dimension of each algorithm was set to 3, which represented the three parameters (C, ε, σ) of the SVM.The search range of C was set to [10 0 , 10 3 ], ε was set to [10 −3 , 10 0 ], and σ was set to [10 −2 , 10 1 ].The remaining parameters of the NHPSO-JTVAC were set as listed in Table 3.Furthermore, the initial parameters of PSO, GA, BA, and GOA were set as listed in Appendix A (Table A1).

Number of search dimensions 3
Range of SVM parameters Maximum number of generations 500

Performance Metrics
To measure prediction accuracy, this study applied certain widely used regression prediction performance metrics: root-mean-square error (RMSE), mean absolute error (MAE), and MAPE, as shown in Table 4. Here, N is the sample size, y i is the actual value, and ŷi is the predicted value.

Metrics Calculation
RMSE (y, ŷ) Lower values of MAPE, MAE, and RMSE indicate higher accuracy of the model, meaning that the prediction results are more convincing.According to the MAPE metric, which has been widely applied to evaluate industrial and business data, when MAPE < 10%, it can be considered as highly accurate forecasting; when 10% < MAPE < 20%, it can be considered as good forecasting; when 20% < MAPE < 50%, we can see it as reasonable forecasting; when MAPE > 50%, the interpretation is inaccurate forecasting [42].

Experimental Results
We conducted prediction experiments using test data to verify the proposed NHPSO-JTVAC-SVM model.We compared the model with SVM, NHPSO-JTVAC-SVM, BA-SVM, GA-SVM, and GOA-SVM.The 5-fold CV scores in the iterative process of the integrated models are shown graphically in Figure 7a,b.The best 5-fold CV scores searched by the five models are listed in Table 5.The results demonstrated that the NHPSO-JTVAC algorithm had the best search performance with the best fitness values of 12.92% and 20.19% in the block assembly process performance dataset and pre-outfitting process performance dataset, respectively.
Appl.Sci.2021, 11, x FOR PEER REVIEW 10 of 16 MAE was 0.89.Moreover, in the pre-outfitting process performance dataset, the MAPE and MAE were 17.86% and 0.96, respectively.In addition, the NHPSO-JTVAC-SVM model was significantly better than the SVM model.Pre-outfitting process performance data Table 6 shows the optimal values of the three SVM parameters (C, ε, and σ) for each SVM-based model.In addition, Table 7 shows the test accuracy of these models based on the MAE, RMSE, and MAPE.The test error of MAPE, which we set as the fitness function of the optimization process, is shown graphically in Figure 8.We observed that the NHPSO-JTVAC-SVM model had the smallest MAPE in the training set (5-fold CV) and the smallest error in the test set.The results indicated that the test errors of the NHPSO-JTVAC-SVM model were the smallest in these models.In the block assembly process performance dataset, the MAPE of the NHPSO-JTVAC-SVM algorithm was 11.79%, and the MAE was 0.89.Moreover, in the pre-outfitting process performance dataset, the MAPE and MAE were 17.86% and 0.96, respectively.In addition, the NHPSO-JTVAC-SVM model was significantly better than the SVM model.Table 8 lists the average MAPE values based on two datasets and obtained using SVM, PSO-SVM, NHPSO-JTVAC-SVM, BA-SVM, GA-SVM, and GOA-SVM.The average MAPE for the NHPSO-JTVAC-SVM model was 14.83%, which was the smallest among the AI models.Furthermore, Figure 9 shows the predicted results of the test set for different datasets, wherein the NHPSO-JTVAC-SVM model was superior in solving the lead-timeprediction problems.Table 8 lists the average MAPE values based on two datasets and obtained using SVM, PSO-SVM, NHPSO-JTVAC-SVM, BA-SVM, GA-SVM, and GOA-SVM.The average MAPE for the NHPSO-JTVAC-SVM model was 14.83%, which was the smallest among the AI models.Furthermore, Figure 9 shows the predicted results of the test set for different datasets, wherein the NHPSO-JTVAC-SVM model was superior in solving the leadtime-prediction problems.Finally, we compared the proposed NHPSO-JTVAC-SVM model with other conventional ML models, such as the ElasticNet and adaptive boosting (AdaBoost) models.The results indicated that the NHPSO-JTVAC-SVM model we developed had the best performance (Table 9).

Conclusions
Based on the analysis of the parameter performance of SVMs, this paper proposes a hybrid NHPSO-JTVAC-SVM lead-time-prediction model.It fully utilizes the global search feature of the NHPSO-JTVAC algorithm to optimize the parameters of an SVM, which overcomes the blindness of SVM parameter selection.Compared with commonly used methods, the parameter selection in this paper provides clearer theoretical guidance.Additionally, in the process of searching for parameters, the NHPSO-JTVAC algorithm is superior in terms of performance.Furthermore, the experimental results indicated that the NHPSO-JTVAC-SVM prediction model has good prediction accuracy.Overall, the results indicated that the optimized model is better than other machine learning models.
Note that the fitness function used in this study was the MAPE.Although the test error MAPE of the NHPSO-JTVAC model was better than other models, other performance metrics such as RMSE were worse than those of other models such as GOA-SVM and GA-SVM.To optimize the model further, we may develop an optimization algorithm that considers multi-fitness functions, an important aspect of future research.

Figure 1 .
Figure 1.Differences in planned and actual lead times.

Figure 1 .
Figure 1.Differences in planned and actual lead times.
illustrates the SVM flow chart based on the NHPSO-JTVAC algorithm.

16 •
Appl.Sci.2021, 11, x FOR PEER REVIEW 6 of Preprocess the data, and then split the dataset randomly into a training and test set (8:2).• Randomly initialize the velocity and position of the particles, where the position vector (3-dimensional) represents the three parameters (, , ) of the SVM.• Calculate the fitness value of each particle and determine the current p-best and gbest positions.The fitness function selected in this study was the mean absolute percentage error (MAPE) function (Equation (

Figure 4 .
Figure 4. Original dataset of (a) block assembly process and (b) block pre-outfitting process.

Figure 4 .
Figure 4. Original dataset of (a) block assembly process and (b) block pre-outfitting process.

Figure 5 .
Figure 5. Flow chart of feature selection.

Figure 6 .
Figure 6.Flow chart of the prediction models.

Figure 5 .
Figure 5. Flow chart of feature selection.

Figure 5 .
Figure 5. Flow chart of feature selection.

Figure 6 .
Figure 6.Flow chart of the prediction models.Figure 6. Flow chart of the prediction models.

Figure 6 .
Figure 6.Flow chart of the prediction models.Figure 6. Flow chart of the prediction models.

Figure 8 .
Figure 8. Test MAPE of each model in (a) block assembly process performance dataset and (b) block pre-outfitting process performance dataset.

Figure 8 .
Figure 8. Test MAPE of each model in (a) block assembly process performance dataset and (b) block pre-outfitting process performance dataset.

Figure 9 .Figure 9 .
Figure 9. NHPSO-JTVAC-SVM model's predicted results of the (a) block assembly process performance test set and (b) block pre-outfitting process performance test set.

Table 1 .
Research literature on SVM optimization techniques.

Table 5 .
K-fold CV scores (MAPE) of each model.

Table 5 .
K-fold CV scores (MAPE) of each model.

Table 7 .
Test errors of each model.

Table 6 .
Optimal values of the three SVM parameters (, , and ).

Table 9 .
Comparison with other machine models.