Next Article in Journal
Unveiling Nanogranular Advancements in Nickel-Doped Tungsten Oxide for Superior Electrochromic Performance
Next Article in Special Issue
Effects of Laser Remelting on Frictional Properties of Supersonic Flame-Sprayed Coatings
Previous Article in Journal
Crystalline Structure and Optical Properties of Cobalt Nickel Oxide Thin Films Deposited with a Pulsed Hollow-Cathode Discharge in an Ar+O2 Gas Mixture
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Ship Painting Man-Hours Based on Selective Ensemble Learning

School of Mechanical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212100, China
*
Author to whom correspondence should be addressed.
Coatings 2024, 14(3), 318; https://doi.org/10.3390/coatings14030318
Submission received: 12 January 2024 / Revised: 22 February 2024 / Accepted: 4 March 2024 / Published: 6 March 2024
(This article belongs to the Special Issue The Present Status of Thermally Sprayed Composite Coatings)

Abstract

:
The precise prediction of painting man-hours is significant to ensure the efficient scheduling of shipyard production and maintain a stable production pace, which directly impacts shipbuilding cycles and costs. However, traditional forecasting methods suffer from issues such as low efficiency and poor accuracy. To solve this problem, this paper proposes a selective integrated learning model (ISA-SE) based on an improved simulated annealing algorithm to predict ship painting man-hours. Firstly, the improved particle swarm optimization (MPSO) algorithm and data grouping techniques are employed to achieve the optimal selection and hyperparameter optimization of base learners, constructing a candidate set of base learners. Subsequently, the simulated annealing algorithm is improved by adding random perturbations and using a parallel perturbation search mechanism to enhance the algorithm’s global search capability. Finally, an optimal set of base learners is composed of the candidate set utilizing the ISA-SE model, and a heterogeneous ensemble learning model is constructed with the optimal set of base learners to achieve the precise prediction of ship painting man-hours. The results indicate that the proposed ISA-SE model demonstrates improvements in accuracy, mean absolute error, and root mean square error compared to other models, validating the effectiveness and robustness of ISA-SE in predicting ship painting man-hours.

1. Introduction

Ship painting, as one of the three pillar processes of modern ship construction, runs through the entire shipbuilding process. Serving as a productive method for hull protection, it markedly mitigates the harmful impact of the ship’s demanding operational environment, curbing hull corrosion, cracking, and related issues. Beyond affording robust protection, painting operations also confer aesthetic enhancements, diminish resistance, and yield other ancillary benefits [1,2]. According to data from the China Shipbuilding Research Institute, the painting cost constitutes 8–10% of the total price for newly constructed ships in the domestic shipbuilding process. In advanced shipyards, such as those in Japan and South Korea, this proportion ranges from 5% to 6%, with specific shipyards even achieving percentages as low as 3% to 5%. In addition, as far as painting man-hours are concerned, some leading domestic shipbuilding companies consume three times as much as German shipyards did 15 years ago three times. Consequently, a considerable disparity exists between domestic painting costs and efficiency in comparison to developed countries, and addressing this gap is imperative to enhance painting efficiency and reducing costs. Man-hours, serving as a pivotal unit for cost measurement, constitute a crucial metric for the rational planning and scheduling of enterprise production. Rapid and precise estimation of painting man-hours stands as an effective strategy for elevating enterprises’ international competitiveness.
Within the realm of ship painting, numerous uncertainties arise from the interplay of personnel, processes, and environmental factors, rendering the prediction of man-hours a challenging endeavor. Typically, enterprises rely on process flow and historical experience to forecast man-hours; however, the accuracy of such predictions often needs to improve due to the multifaceted nature of the influencing factors. Consequently, the implementation of precise task assignments becomes a formidable challenge. Establishing a rational and efficient method for predicting painting man-hours holds the potential to intricately allocate a man-hour to each painting task. This, in turn, facilitates effective control over painting costs and production schedules, thereby enhancing the competitive edge of shipbuilding enterprises on the international stage.
The extensive application of machine learning and deep learning has proven effective not only in solving linear problems [3] but also in exhibiting superior predictive capabilities in handling nonlinear issues [4], coupled with a robust self-learning capacity [5]. Scholars, both domestically and internationally, have conducted predictive studies on man-hours across various industries, including mechanical and electrical engineering [6], as well as construction and building materials [7]. Minhoe Hur et al. [8] employed both the MLR (multiple linear regression) and CART (classification and regression trees) models, which balance model interpretability and accuracy for predicting shipbuilding man-hours. Their research demonstrated that the proposed models outperform traditional methods and expert predictions. Nur Najihah Abu Bakar et al. [9] proposed a data-driven approach based on cold ironing forecasting of ship berthing, which includes various models such as artificial neural networks, decision trees, etc., and the results show that the artificial neural network model can handle the complex nonlinear port activity prediction problem. The authors of [10] introduced a data-driven approach, combining multiple linear regression (MLR) with building information modeling (BIM) to establish a work-hour prediction model. The feasibility of this method was validated through its application in estimating labor man-hours in steel structure manufacturing. However, there are limitations to a single machine-learning model.
Ensemble learning, which combines a series of learners through certain rules to achieve a more significant generalization performance than individual learners [11], is categorized into homogeneous and heterogeneous ensembles [12]. Heterogeneous ensembles, in particular, have shown outstanding predictive effectiveness across various fields. Shuli Wen et al. [13] proposed a heterogeneous integration method for optimal time interval forecasting of solar power generation based on a stochastic ship motion model and verified the feasibility of the method on the power system of a large oil tanker, which provides a reliable reference for ship power system operators to realize better energy management. Zhou Sun et al. predicted ship fuel consumption by constructing a heterogeneous integrated learning model, which was experimentally verified to produce excellent prediction results. Zeyu Wang et al. [14] compared 57 heterogeneous ensemble learning models constructed from six base models using an exhaustive search method to identify the optimal model for predicting building energy consumption. Zhao Yuexu et al. [15] built a heterogeneous ensemble learning model from the perspectives of model, sample, and parameter diversity, aiming to predict the duration of traffic accidents. These studies indicate that constructing heterogeneous ensemble models can effectively improve prediction accuracy. However, these models often lack optimization of the hyperparameters of the base learners, potentially leading to suboptimal solutions and significant influence from the choice of base learner hyperparameters. To address this, Park Uyeo et al. [16] optimized the hyperparameters of base learners when constructing heterogeneous ensemble models, thereby enhancing the persuasiveness of their predictions. Chen Cheng et al. [17] proposed an integrated learning (EL) based unmanned surface vehicle dynamics model for accuracy prediction and used particle swarm optimization and genetic algorithm to optimize the hyperparameters of the base learner.
While ensemble learning has demonstrated outstanding predictive performance across various fields, the use of a large number of learners may lead to redundancy and longer computational times, potentially giving rise to issues such as overfitting and underfitting in prediction outcomes [18]. To address this challenge, selective ensemble learning has emerged [19]. The core idea behind selective ensemble learning is to select superior learners from a group of individual learners based on a specific strategy and then integrate them to form a more generalized classifier. Shuai Liu et al. [20] proposed a homogeneous selective ensemble forecasting framework based on an improved differential evolution algorithm to enhance the accuracy of hydrological forecasting. Huaiping Jin et al. [21] employed a soft measurement method based on data augmentation and selective ensemble for predicting measurement results, validating the effectiveness and excellence of this approach. Zhang Fan et al. [22] introduced a selective ensemble learning method based on the local model prediction accuracy and an adaptive weight calculation method for submodels. Simulations using accurate spatial dynamic wind power prediction data demonstrated the effectiveness of the proposed method in handling nonlinear, emotional, and multi-rate data regression problems in wind power generation. Additionally, Huaiping Jin et al. [23] proposed a selective ensemble model based on finite mixture Gaussian process regression for wind power prediction. The results outperformed traditional global and ensemble wind power prediction methods, effectively addressing the temporal changes in wind power data while maintaining high predictive accuracy. However, it is worth noting that the scholars mentioned above did not perform selection on the individual learners when constructing selective ensemble learning models. This oversight could directly impact the final predictive results.
The studies of the scholars mentioned above show that there are limitations of a single learner; in contrast, the generalization performance of integrated learning is higher, which is more suitable for research in complex contexts, and selective integrated learning optimizes the individual-based learners on the basis of integrated learning, which further optimizes the structure of the model and improves the performance of the model. However, in the above study, the selected individual-based learners were not analyzed for their compatibility with the research context, and the direct selection may lead to non-optimal results.
Despite the positive applications of machine learning and deep learning across various fields, ranging from single models to ensemble learning formed by combining multiple models, experts and scholars have consistently affirmed their outstanding performance in prediction. However, in the context of predicting ship painting man-hours, there is a relatively limited amount of reported research. Due to the multitude of uncertainties affecting ships during the painting process, leading to the instability of painting man-hours, reliance on a single predictive model is restrictive and results in a significant decline in accuracy. Therefore, employing multiple algorithmic models with distinct characteristics as base learners to construct a selective ensemble learning model for prediction holds the potential to enhance accuracy and robustness significantly.
In view of the above shortcomings, this paper proposes a selective integrated learning model (ISA-SE) based on an improved simulated annealing algorithm to predict the man-hours of ship painting. This model employs ten different algorithms, including random forest (RF), support vector regression (SVR), and extreme gradient boosting (XGBoost), as base learners. Initially, the hyperparameters of these ten base learners are optimized through cross-validation and the MPSO algorithm. Subsequently, the dataset is randomly divided into six subsets using data grouping techniques. The optimal hyperparameters of the base learners are utilized as initial data, and predictions are made using each base learner separately. Base learners that meet the criteria are selected as candidate learners based on a comparison of the prediction results. To address the potential issue of local optima in the simulated annealing algorithm, a selection approach incorporating increased random perturbations is adopted, and a parallel perturbation search is conducted on this selection path to expand the search range. Finally, the improved simulated annealing algorithm is applied to further filter the candidate learners, selecting the optimal combination of diverse base learners to predict ship painting duration accurately.

2. Establishment of Selective Ensemble Learning Model

2.1. Selective Ensemble Learning

The core idea of selective ensemble learning is to select a representative subset of classifiers from numerous classifiers for combination, eliminate classifiers with poor classification performance and redundancy, and comprehensively improve the generalization performance and prediction efficiency of classifiers. The basic framework is shown in Figure 1.

2.2. Selective Integration Technology

In ensemble models, algorithms are often chosen based on experience. However, the predictive capabilities of models can vary in different contexts, making it challenging to guarantee prediction accuracy. Additionally, when combining basic algorithms, the multitude of combinations makes traditional exhaustive search methods time-consuming. Moreover, the complex structure of ensemble models makes it challenging for conventional mathematical methods to effectively and reasonably allocate weights. To address these issues, this paper proposes a selective ensemble learning model named ISA-SE, which includes four essential parts: the optimization of learner hyperparameters, selection of learner performance, improvement of the simulated annealing algorithm, and optimization of learner combinations.

2.2.1. Optimization of Learner Hyperparameters

In this study, ten base learners were selected based on the characteristics of the ship painting dataset, and the hyperparameters that significantly influence the performance of these learners were optimized. This optimization aims to identify the optimal parameter model within the specific context of this dataset. Common methods for hyperparameter tuning include manual adjustment [24], grid search [25], random search [26], and machine learning-based approaches [27,28]. Machine learning algorithms, through iterative processes, are adept at rapidly identifying the most effective combinations of parameters. In this research, an enhanced particle swarm optimization (MPSO) algorithm [29] was employed to determine the optimal parameters of the base learners, thus creating the MPSO-X model. MPSO incorporates mutation operations from genetic algorithms into the standard particle swarm framework to mitigate the issue of local optima. The process of tuning the hyperparameters, such as the error penalty factor C and the kernel function coefficient g for the base learner SVR, is illustrated in Figure 2. The hyperparameter optimization for other base learners is similarly conducted using this computational process.

2.2.2. Performance Screening of Learners

Utilizing data grouping techniques, the preprocessed dataset was randomly divided into subsets, which were then individually trained using the optimized models of the base learners. A comparative analysis of the training outcomes was conducted to evaluate the predictive precision of each base learner across different training sets. This involved calculating the overall mean accuracy of each learner across all training sets and comparing these means with the average predictive accuracies of other algorithmic models. Subsequently, the comparative results were systematically ranked, leading to the exclusion of models demonstrating considerable variance and subpar average predictive accuracy across diverse datasets. Figure 3 illustrates this methodology in detail.
In this study, the coefficient of determination, R 2 , was adopted as the metric for evaluating predictive accuracy. The R 2 value ranges from 0 to 1, with values closer to 1 indicating a higher proportion of variance in the dependent variable ( y ) that is predictable from the independent variable ( x ). This denotes a closer alignment of the regression line to the observed data points. A higher R 2 value reflects a better fit of the model, implying that the variability in x more effectively explains the variations in y , thereby signifying enhanced predictive accuracy.
The sum of squared residuals is:
S S r e s = ( y i y ^ i ) 2
The total sum of squares is:
S S t o t = ( y i y ¯ i ) 2
The expression for R 2 is:
R 2 = 1 S S r e s S S t o t = 1 ( y i y ^ i ) 2 ( y i y ¯ i ) 2
Among them, y i represents the true value, y ^ i represents the predicted value, and y ¯ i represents the sample mean.
The average accuracy of the algorithm model is:
D X = 1 n i = 1 n R i 2
Among them, X represents the algorithm, n represents the number of data groups, and R i 2 represents the R 2 -value of the i-th group of data.

2.3. Improved Simulated Annealing Algorithm

The simulated annealing algorithm [30,31,32], owing to its similarity to the annealing process in physics and combinatorial optimization problems, has found application in solving combinatorial optimization. Due to the complexity of combinatorial types, improving the simulated annealing algorithm can enhance its efficiency and accuracy. Consequently, following the optimization and selection process of base learners via the MPSO algorithm and data grouping techniques, an improved simulated annealing algorithm, in conjunction with an average weighting method, is employed. This approach selects the optimal set of base learners from the candidate pool, constructing a heterogeneous ensemble learning model. This model is adept at achieving accurate predictions of the man-hour required for ship painting.
In the process of optimizing combinations of candidate learners using the simulated annealing algorithm, random perturbations can potentially lead to local optima. To avoid this situation, we increase the disturbance path from one to three during random perturbation, and during this process, three disturbance paths are generated simultaneously to obtain three new solutions. We compare the three new solutions and select the optimal one, which expands the search range. Additionally, during the optimization of combinations using the improved simulated annealing algorithm, there is a possibility of redundancy with certain types of learners or multiple learners of the same type being present in the combination, which could diminish the generalization capability of the learner ensemble. To avoid this, we redefine the integration accuracy E ( q ) for generating new solutions under the Metropolis criterion by introducing the R 2 and K a p p a coefficients and using λ to regulate the weights to ensure that the new solutions generated guarantee the prediction accuracy along with the variability of the base learner. The redefined formula for E ( q ) is as follows:
E ( q ) = R 2 ( q ) × λ + K ¯ a ( q ) × ( 1 λ )
where q is the new solution (base learner combination) screened during each iteration, R 2 is used to evaluate the performance of the combination, K ¯ a denotes the average value of K a p p a coefficients, λ is used to regulate the weight of the evaluation metrics R 2 and K ¯ a in the metric E ( q ) , and when λ is bigger, R 2 plays a greater role, and when λ is smaller, K a p p a coefficients K ¯ a plays a greater role. the expression of K ¯ a is given by:
K ¯ a = 1 n i = 1 n K a i
where K a i denotes the K a p p a coefficients of the two base learners in the i-th group.
The K a p p a coefficient takes values in the range of [–1, 1], when the K a p p a coefficient is closer to −1, it indicates that the prediction variability of the learner is greater; the closer it is to 1, the result is the opposite. The coefficient is calculated by the formula:
K a = φ 1 φ 2 1 φ 2
where φ 1 and φ 2 are the probabilities of agreement and disagreement between the two learners, respectively. The formula is:
φ 1 = a + d m
φ 2 = ( a + b ) ( a + c ) + ( c + d ) ( b + d ) m 2
where a ( d ) denotes the number of samples predicted correctly (incorrectly) by both learners, b ( c ) denotes the number of samples with one correct and one incorrect, and m is the total number of samples. Table 1 gives the combination of two different learners h i and h j i , j = 1 , 2 , , N , i j on the prediction results.
a + b + c + d = m
The improved simulated annealing algorithm is utilized to randomly initialize and select learners from the set of candidate base learners. To ascertain whether each learner participates in the ensemble, a binary representation is employed, where ‘1’ signifies the selection and ‘0’ indicates the exclusion of a particular learner from the candidate set. This selection process is visually represented in the encoding structure, as illustrated in Figure 4. The procedure for selecting learners based on the improved simulated annealing algorithm is systematically detailed in Algorithm 1.
Algorithm 1. Improved Selective Integration Algorithm for Simulated Annealing
Input: set of candidate learners N = { n 1 , n 2 , n 3 . n n }
Output: set of base learners M = { m 1 , m 2 , m 3 m m }
Step 1: Initialize the temperature t > 0 , generate the initial solution p = { 1 , 0 , 0 1 } from the candidate set N and compute the integration accuracy of p as E ( p ) ;
Step 2:
(1)
Remove one or two learners with poor prediction results from the set of p ;
(2)
Add one or two new learners to the set of p ;
(3)
Remove one or two learners with poor predictions from the set of p and add one or two new learners.
Using parallel perturbation, three new solutions q i ( i = 1 , 2 , 3 ) are randomly generated according to a certain probability, and E ( q i ) is the integration accuracy of the new solution q i achieved by the metric;
Step 3: Compare to obtain the maximum value E ( q ) max of E ( q i ) and judge whether to accept it or not based on the Metropolis criterion;
Step 4: A solution stabilized at temperature t is obtained by continuous iteration. Assuming that the number of times a new solution is accepted at a certain temperature exceeds a set threshold, the temperature decreases more;
Step 5: If the termination condition is met, the algorithm will be terminated and will be used as the final integrated set of classifiers, otherwise it will jump to Step 2;
Step 6: Update the temperature t, according to the given temperature control function and jump to Step 2;
Metropolis guidelines for:
P = 1 , E ( q ) max > E ( p ) e ( E ( q ) max E ( p ) ) λ t , E ( q ) max E ( p )
where t denotes the temperature and λ denotes the scaling factor of temperature decrease. P denotes the probability of accepting the new solution, when E ( q ) max > E ( p ) , it means that the accuracy of the new solution is higher than that of the solution in the previous state and the probability of accepting the new solution is 1. When E ( q ) max E ( p ) , it means that the accuracy of the new solution is lower than or equal to that of the solution in the previous state and the probability of accepting the new solution is e ( E ( q ) max E ( p ) ) λ t .

2.4. Selective Integration Technology Forecasting Framework

Based on selective integrated learning to realize the prediction of ship painting man-hours, the calculation flow is shown in Figure 5.
The specific steps are as follows:
Step 1: Firstly, the data related to the influence factors of ship painting man-hours are preprocessed, and the hyper-parameters of the selected 10 base learners are optimized by five-fold cross-validation combined with the MPSO algorithm, respectively, and the optimal hyper-parameters are used as the initial parameters of these 10 base learners;
Step 2: Divide the dataset into six parts by data grouping technique, then train 10 base learners, respectively, analyze the output results of each learner, and keep the learners with high stability and evaluation index value over 0.7 to constitute the candidate base learners. At the same time, the base learners that do not meet the requirements are eliminated;
Step 3: The simulated annealing algorithm is improved, and the combination of candidate base learners is optimized using the improved simulated annealing algorithm, which is validated by the three evaluation metrics of MAE, RMSE, and accuracy, and the combinations of base learners with high variability and high accuracy are filtered out;
Step 4: The test set data are input into a combination of learners, and the outputs of each learner in the combination are used to predict ship painting hours by the average weighting method. The proposed method is compared with the ISA-SE model and the selective integrated learning model composed of the traditional simulated annealing algorithm to verify the effectiveness of the proposed method in predicting ship painting hours.

3. Ship Painting Man-Hour Model Analysis and Modeling

3.1. Data Pre-Processing

3.1.1. Data Acquisition and Characterization

According to the ship coating and painting technology and shipyard research, it is understood that in the process of ship construction, painting not only includes coating operations, but also includes surface pretreatment and secondary descaling activities before coating. Due to the special characteristics of the shipbuilding process, there are some painting operations in each stage of ship construction, and the painting operations are distributed in all stages of the ship construction process. The ship painting stage includes: raw material shot blasting pretreatment, workshop primer painting, and in-dock painting before section/stage/wharf/delivery. Ship painting man-hours are the times that should be spent to complete the painting work according to the specified process.
The ship painting process represents a highly complex and time-intensive operation. The efficiency of painting is influenced by a multitude of factors, leading to significant uncertainties in the time required for completion. Predominant factors impacting the duration include operator skill, working environment, production processes, equipment, and raw materials. Drawing on this theory and combining with the shipyard database and ship painting manuals, this study classifies the factors that mainly affect the painting man-hour, such as worker age, technical level, steel surface cleanliness, spraying methods, film thickness, coating area, segmental structure, temperature, humidity, and wind force, into four categories, namely, human factors attributes, process attributes, equipment and material attributes, and environmental attributes, and uses these factors as inputs to verify the feasibility of the ISA-SE-based ship painting man-hour prediction model proposed in this paper. The specific categories of factors influencing ship painting duration are delineated in Table 2. Based on these factors affecting man-hours, we selected painting data from container ships of two domestic shipbuilding companies. From each company’s database, 600 sets of comprehensive data were randomly selected for experimental purposes, hereafter referred to as the SH and JN datasets.
Given the complexity and intricacy of the influencing factors, our approach initially involves simplifying the model complexity. To this end, we employ Pearson’s correlation coefficient for a comprehensive correlational analysis aimed at identifying the key influencing factors. Subsequently, principal component analysis (PCA) is utilized to perform dimensionality reduction on the retained original data. This methodological strategy effectively streamlines the dataset, ensuring a more focused and efficient analysis while preserving the essential characteristics of the data relevant to the study’s objectives.

3.1.2. Data Encoding and Normalization

Prior to embarking on correlation analysis and dimensionality reduction, it is imperative to convert all variables into numerical data. Quantifiable variables, such as age, coating area, and film thickness, are straightforwardly represented through their numerical values. However, categorical variables like technical level and surface cleanliness necessitate the assignment of specific numerical encodings to facilitate their transformation into quantifiable data. The detailed methodology for this variable conversion process is systematically outlined in Table 3, ensuring a standardized approach for data preprocessing.
The normalization of features is imperative to mitigate the impact of differing scales among features and ensure comparability across various indicators. Standard normalization techniques include linear function normalization and zero-mean normalization. Linear function normalization involves a linear transformation of the raw data, mapping the results to a [0, 1] range, thereby achieving proportional scaling of the original data. The specific formula for this normalization process is as follows:
x i = X i X min X max X min
In the normalization process, x i represents the normalized value of X i , where X max and X min denote the maximum and minimum values of the dataset, respectively.

3.1.3. Selection of Variables

In the context of assessing the linear correlation between two variables, X and Y , the coefficient of Pearson serves as a crucial statistical tool. For preprocessed data related to ship painting man-hours, one can employ Python code to analyze the correlation between various influencing factors and man-hours using the coefficient of Pearson. This analysis facilitates the categorization of correlations based on the coefficient magnitude. Using a coefficient of Pearson with an absolute value greater than 0.8 indicates a strong correlation between the variables. Values between 0.5 and 0.8 suggest a moderate correlation, while those ranging from 0.3 to 0.5 denote a weak correlation. Coefficients below 0.3 imply negligible or no correlation. The specific formula for calculating the coefficient of Pearson is as follows:
r = i = 1 n ( X i X ¯ ) ( Y i Y ¯ ) i = 1 n ( X i X ¯ ) 2 i = 1 n ( Y i Y ¯ ) 2
Here, r represents the coefficient of Pearson between X and Y , X i and Y i are the individual sample points, and X ¯ and Y ¯ are the means of X and Y , respectively.
According to the coefficient of Pearson, correlation analysis on a dataset comprising 1200 groups, the findings are comprehensively illustrated in Figure 6.
The bivariate Pearson’s test results elucidate the varying degrees of correlation between multiple factors and ship painting man-hours. Coating area and film thickness are highly correlated with painting man-hours. The technical level, steel surface cleanliness, spraying method, and segmental structure exhibit a moderate correlation. Age and temperature are weakly correlated with painting man-hours, while wind force and humidity have almost no correlation. Among these factors, there is a moderate correlation between age and technical level, but the correlation between other factors is relatively low, avoiding the issue of feature redundancy. Consequently, age, technical level, film thickness, coating area, steel surface cleanliness, temperature, spraying method, and segmental structure are selected as the eight primary features for predicting painting man-hours. However, using all eight features as inputs in the predictive model could potentially lead to an explosion in dimensionality, thereby increasing the variance in test data error. Therefore, simplifying the model is a crucial step in reducing variance, and dimensionality reduction is an effective strategy to address the problem of high dimensionality.
The principal component analysis (PCA) method employs a dimensionality reduction approach, projecting the original data into a few directions with maximum variance. This multivariate statistical analysis technique transforms multiple indicators into a smaller number of composite indices through orthogonal rotation. By applying the PCA algorithm for linear dimensionality reduction and determining the number of principal components based on the cumulative variance contribution rate, the model can be further simplified. This simplification not only enhances the efficiency of the model but also improves its generalizability.
The specific operations of the PCA algorithm are: (1) collecting p-dimensional random vectors and constructing a sample matrix; (2) calculating the eigenvector of the covariance matrix as λ 1 > λ 2 > > λ p , and the corresponding unit eigenvector is U 1 , U 2 , , U P , where the transformation matrix is A = U , that is, the i-th row of A is the unit eigenvector u i corresponding to the i-th characteristic root of , and the variance of the i-th principal component z i is equal to the i-th characteristic root λ i of ; (3) preserving the principal components corresponding to cumulative variance contribution rate exceeding 90% and use them as input features for subsequent datasets.
The formula for the variance contribution rate of the k-th principal component z k is:
η k = λ k k = 1 p λ k
Theorem-type environments (including propositions, lemmas, corollaries, etc.) can be formatted as follows:
If m principal components are taken, the cumulative contribution rate formula of principal components z 1 , z 2 , , z m is:
ζ m = k = 1 m λ k k = 1 p λ k
In this study, a dataset comprising 1200 samples was employed, with a division of 900 samples for the training set and 300 for the testing set. PCA was applied to the training data for dimensionality reduction. The feature values and the corresponding variance contribution rates were calculated according to the previously described formulae, as detailed in Table 4.
As shown in the table above, the cumulative contribution rate of the eigenvalues of the first three principal components reaches 90%, so three components are selected to replace the original variables. The standardized orthogonal rotation method was used to obtain the factor loadings of each factor on different principal components, and the resulting component matrices are shown in Table 5.
The main component values obtained based on PCA dimensionality reduction technology are shown in the following formula:
z 1 = 0.328 x 1 + 0.413 x 2 + 0.732 x 3 0.687 x 4 0.566 x 5 + 0.132 x 6 + 0.435 x 7 0.332 x 8 z 2 = 0.597 x 1 + 0.298 x 2 0.445 x 3 + 0.713 x 4 + 0.312 x 5 0.377 x 6 0.233 x 7 + 0.053 x 8 z 3 = 0.553 x 1 0.335 x 2 + 0.012 x 3 + 0.225 x 4 0.679 x 5 + 0.028 x 6 + 0.606 x 7 0.258 x 8
In the formula, x 1 x 8 , respectively, represent age, technical level, film thickness, coating area, steel surface cleanliness, temperature, spraying method, and segmental structure.

3.2. Basic Learner Parameter Settings

The selection of base learners for the ship painting process was determined based on the specific conditions of ship painting. In the practical ship spray painting process, the data storage for factors affecting painting man-hours is limited, and the prediction process involves nonlinear mapping. Consequently, considering the characteristics of ship painting man-hours and the strengths and weaknesses of various machine learning algorithms, ten suitable base learner models were identified. These models were then optimized for their principal hyperparameters using the modified particle swarm optimization (MPSO) algorithm, as described in Section 2.2.1. The optimization results are presented in Table 6.

3.3. Evaluation Indicators

To assess the efficacy of the proposed selective ensemble learning model in predicting ship painting man-hour, three evaluation metrics are employed: accuracy, mean absolute error (MAE), and root mean square error (RMSE). These metrics provide a comprehensive evaluation of the model’s performance, with accuracy indicating the overall correctness of the predictions, MAE reflecting the average magnitude of the errors in the predictions, and RMSE offering a measure of the square root of the average squared differences between the predicted and actual values. The utilization of these diverse metrics ensures a robust and thorough evaluation of the model’s predictive capabilities in the context of ship painting man-hour estimation.
The accuracy is a measure representing the proportion of correctly predicted samples out of all samples. It includes both correctly predicted positive and negative samples. The formula for calculating accuracy is as follows:
A c c u r a c y = T P + T N T P + F N + F P + T N
TP (true positives) are the correctly predicted positive observations; TN (true negatives) are the correctly predicted negative observations; FP (false positives) are the wrongly predicted positive observations; FN (false negatives) are the wrongly predicted negative observations.
MAE is used to measure the average absolute error between predicted values and true values. A smaller MAE indicates a better model, and its definition is as follows:
M A E = 1 n i = 1 n y i y ^ i , 0 , +
RMSE is used to measure the average size of the prediction error of a model. A smaller RMSE indicates a smaller difference between the predicted and actual values of the model, indicating a higher degree of fit. Its formula is as follows:
R M S E = 1 n i = 1 n y ^ i y i 2 , 0 , +
For the above formula, n represents the number of samples, y i represents the true value of the i-th sample, and y ^ i represents the predicted value of the i-th sample.

4. Experiments and Results

In this section, we conducted a thorough analysis of the experimental results. Initially, through the astute application of data grouping techniques, we compared the accuracy and stability of various base learners, retaining those meeting the required criteria as candidates. Subsequently, we analyzed the impact of assigning different values to the variable λ in the assessment criteria of the proposed ISA-SE model on the prediction outcomes. Following this, a comparative analysis of the performance differences between the ISA-SE model and each base learner was undertaken. Finally, to further substantiate the superiority of our enhancements to the simulated annealing algorithm, we contrasted the ISA-SE model with a selective ensemble learning model developed using the conventional simulated annealing approach, providing an in-depth examination of the outcomes. This verification was anchored on a dataset comprising 1200 data samples from container ships, obtained from two shipping companies, aimed at evaluating the predictive accuracy and generalization capacity of the models delineated in the manuscript.

4.1. Optimization and Screening of Base Learners

Utilizing a dataset comprising 1200 data points from two selected companies, this study employed data grouping techniques to randomly distribute the datasets of both companies into six subsets: SH-a, SH-b, SH-c, JN-a, JN-b, and JN-c, with each subset containing 200 data points. These subsets were then utilized as input for various base learners to assess their robustness. The performance of these base learners was quantified using the coefficient of determination (R2), as depicted in Figure 7. Table 7 presents the fluctuation in R2 values of the base learners across the SH and JN datasets, with fluctuations exceeding 10% highlighted in bold.
As inferred from Figure 7, the XGBoost model consistently demonstrates superior accuracy across different datasets. In contrast, the PR model exhibits R2 values below 0.7 in the SH-b, SH-c, and JN-c datasets. Similarly, the ENR model shows R2 values lower than 0.7 in the SH-a, SH-b, SH-c, and JN-a datasets. In the case of other learners, their R2 values are consistently above 0.7. Upon analyzing the average performance across the six datasets, the models rank in descending order of efficacy as follows: extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), random forest (RF), adaptive boosting (AdaBoost), support vector regression (SVR), K-nearest neighbors (KNN), back propagation neural network (BPNN), multilayer perceptron (MLP), polynomial regression (PR), and elastic net regression (ENR).
Analysis of Table 7 reveals that the PR model displays substantial fluctuation in R2 values, ranging between 17.39% and 18.53% across the SH and JN datasets. Similarly, the ENR model exhibits a variability range of 10.49% to 20.66%, indicating a lack of stability. In contrast, the other evaluated learning models maintain fluctuation ranges within 10%, underscoring their relative consistency. Based on these observations, it is apparent that both the PR and ENR models, due to their higher variability and lower stability, are not suitable candidates for inclusion in a selective ensemble modeling approach.

4.2. Comparison of Different λ Values in the ISA-SE Model

In the process of optimizing the combination of candidate base learners using the improved simulated annealing algorithm, the parameter λ was employed to adjust the weight within the ensemble accuracy E ( q ) . To investigate the impact of λ on the results of the combinatorial optimization, experiments were conducted with different values assigned to λ . The range of λ values was fixed between 0.3 and 0.7, with a step size of 0.1. The outcomes of these experiments, determined through five-fold cross-validation for various λ values, are presented in Table 8, Table 9 and Table 10.
Table 8, Table 9 and Table 10 present the experimental results of the proposed model for values of λ at 0.3, 0.4, 0.5, 0.6, and 0.7 in terms of accuracy, MAE, and RMSE. Each dataset contains 200 data points, and all experimental results are based on the average of independent data prediction outcomes. The data in the tables are expressed as mean values and standard deviations. The best experimental results are highlighted in bold in Table 8, Table 9 and Table 10.
A detailed examination of the experimental results across various evaluation metrics reveals only minor differences in different λ values within each dataset. This implies that the results of each metric are not entirely dependent on the size of the λ value. Considering the overall performance across all metrics, it was observed that the experimental results are comparatively better when λ is set to 0.5. Therefore, in subsequent experiments within this study, the value of λ is fixed at 0.5.

4.3. Comparison between ISA-SE Model and Various Base Learner Models

This section compares the proposed method with all the base learners that make up the method. Table 11, Table 12 and Table 13 provide detailed experimental results for accuracy, MAE, and RMSE on six datasets, respectively. The table displays the average and standard deviation of each evaluation metric for various algorithms on all datasets, with the best value in bold for each row. Figure 8, Figure 9 and Figure 10 show the average values of accuracy, MAE, and RMSE under 1200 sets of data.
The results from Table 11, Table 12 and Table 13 show that the proposed model has the best performance on multiple datasets for all three evaluation metrics; among the eight base learner models, two models, GBDT and XGBoost, have the best performance and outperform the proposed model on individual datasets; the results in Figure 8, Figure 9 and Figure 10 show that comparing with the best and worst performers among the base learners, the proposed model improves the performance on accuracy by 2.31% and 14.41%, MAE by 2.76% and 10.90%, and RMSE by 1.19% and 13.55%, respectively. As a whole, the proposed model shows better robustness.

4.4. Comparison of Selective Ensemble Learning Models Composed of ISA-SE Model and Traditional Simulated Annealing Algorithm

In this subsection, the proposed method is compared with the traditional simulated annealing (SA) algorithm-based selective ensemble learning model. Table 14 and Table 15 present the results of the two models under six evaluation metrics, including accuracy, MAE, RMSE, the number of iterations, the number of base learners, and the time taken. Table 14 shows the average values and standard deviations of these evaluation metrics for both models across two datasets, with the best values in each row highlighted in bold. Figure 11 illustrates a detailed comparison between the predicted and actual values, with the vertical axis representing the normalized values of ship painting man-hours. To provide a clearer visualization of the trajectory optimization effect of the proposed algorithm, the experiment parameters for both algorithms were set uniformly: a maximum of 200 iterations, an initial temperature T max = 200, a final temperature T min = 0.01, and a decay coefficient a = 0.98. The adaptation values of both models throughout the iterative process on the SH and JN datasets are depicted in Figure 12.
As indicated in Table 14, for the two datasets evaluated, the proposed model demonstrates superior performance compared to the traditional model. Specifically, the proposed model exhibits an average accuracy improvement of 4.84% and 5.32%, an MAE reduction of 4.02% and 5.06%, and a decrease in RMSE by 4.95% and 4.69%, respectively. These results clearly show that the proposed model outperforms the traditional model across the three evaluation metrics of accuracy, MAE, and RMSE.
Further analysis of Table 15 reveals that the proposed model requires fewer iterations and less time than the traditional model. This indicates that the proposed model can search more rapidly during the iteration process, effectively reducing the time consumed during the iterative convergence process.
From the analysis of Figure 12, it is evident that under both datasets, the final fitness values of the proposed model are lower than those of the traditional model. This is attributed to the conventional model’s tendency to reach local optima prematurely during the iteration process. Consequently, it can be concluded that the improvements made to the traditional simulated annealing algorithm—expanding the search range and enhancing the global search capability—effectively overcome the limitations of local entrapment and slow convergence inherent in the conventional model. The proposed model demonstrates rapid convergence, balancing global and regional convergence capabilities, thus avoiding the generation of locally optimal solutions. This confirms the effectiveness of the improvements made to the algorithm.

5. Conclusions

The present study introduces a novel ISA-SE selective integrated learning model for predicting ship painting man-hours. The optimization of base learners is achieved through the application of the MPSO algorithm and data grouping techniques, thereby enhancing the performance of the base learners. The improvement of the simulated annealing algorithm is realized by incorporating random perturbation strategies and a parallel search mechanism. Building upon this, the enhanced simulated annealing algorithm is employed as a tool for selective integrated learning, combining base learners to construct a heterogeneous integrated learning model. Experimental validation is carried out on a container ship, and the results show that the ISA-SE model outperforms the traditional single learner model in terms of prediction accuracy, in comparison with the selective integrated learning model constituted by the traditional simulated annealing algorithm, it is verified that the improvement of the simulated annealing algorithm effectively avoids local optimal solutions, and significantly improves the accuracy of the prediction of the ship’s painting man-hours. This work provides effective support for the subsequent refinement of work assignments and control of production beats.
In future work, we intend to integrate ship painting man-hour predictions with real-world operational scenarios to guide shipyard scheduling. Despite achieving favorable predictive outcomes in experiments, the extended duration of ship painting projects introduces inherent uncertainties that may lead to deviations in guiding fine-tuned scheduling processes. To enhance practical applicability, further refinement of painting man-hour predictions is imperative, focusing on segment-specific predictions for various ship structures, thereby ensuring a more rigorous outcome.

Author Contributions

H.B. revised the paper and completed it, Z.G. wrote the first draft of the paper, X.Z. collected and sorted the data, T.Y. Validation the paper, H.Z. provided funding for the paper. All authors reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The authors gratefully acknowledge the financial support from the Ministry of Industry and Information Technology High-Tech Ship Research Project: Research on the Development and Application of a Digital Process Design System for Ship Coating (No.: MC-202003-Z01-02), the National Defense Basic Scientific Research Project: Research and Development of an Intelligent Methanol-Fueled New Energy Ship (No.: JCKY2021414B011), and the RO-RO Passenger Ship Efficient Construction Process and Key Technology Research (No.: CJ07N20).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that there is no conflicts of interest regarding the publication of this work.

Abbreviations

ISA-SESelective ensemble learning based on improved simulated annealing algorithm
MPSOImproved particle swarm optimization algorithm
PCAPrincipal component analysis
KNNK-nearest neighbors
RFRandom forest
GBDTGradient boosting decision tree
XGBoostExtreme gradient boosting
SVRSupport vector regression
BPNNBack propagation neural network
AdaBoostAdaptive boosting
MLPMultilayer perceptron
PRPolynomial regression
ENRElastic net regression
MAEMean absolute error
RMSERoot mean square error

References

  1. Almeida, E.; Diamantino, T.C.; Sousa, O.D. Marine paints: The particular case of antifouling paints. Prog. Org. Coat. 2007, 59, 2–20. [Google Scholar] [CrossRef]
  2. Yuan, X.; Bu, H.; Niu, J.; Yu, W.; Zhou, H.; Ji, X.; Ye, P. Coating matching recommendation based on improved fuzzy comprehensive evaluation and collaborative filtering algorithm. Sci. Rep. 2021, 11, 14035. [Google Scholar] [CrossRef]
  3. Sundus, K.I.; Hammo, B.H.; Al-Zoubi, M.B.; Al-Omari, A. Solving the multicollinearity problem to improve the stability of machine learning algorithms applied to a fully annotated breast cancer dataset. IMU 2022, 33, 101088. [Google Scholar] [CrossRef]
  4. Chen, W.; Wang, Q.; Hesthaven, J.S.; Zhang, C. Physics-informed machine learning for reduced-order modeling of nonlinear problems. J. Comput. Phys. 2021, 446, 110666. [Google Scholar] [CrossRef]
  5. Jia, Y.; Gao, M.; Gu, J. Self-learning regression interpolation based on Ricker kernel function for seismic data. Explor. Geophys. 2022, 53, 289–299. [Google Scholar] [CrossRef]
  6. Rodrigues, A.; Silva, F.J.; Sousa, V.; Pinto, A.; Ferreira, L.; Pereira, T. Using an Artificial Neural Network Approach to Predict Machining Time. Metals 2022, 12, 1709. [Google Scholar] [CrossRef]
  7. Alemu, S.K. Construction time prediction model for public building projects. Eng. Constr. Archit. Manag. 2022, 29, 2183–2206. [Google Scholar] [CrossRef]
  8. Hur, M.; Lee, S.; Kim, B.; Cho, S.; Lee, D.; Lee, D. A study on the man-hour prediction system for shipbuilding. J. Intell. Manuf. 2015, 26, 1267–1279. [Google Scholar] [CrossRef]
  9. Bakar, N.N.A.; Bazmohammadi, N.; Çimen, H.; Uyanik, T.; Vasquez, J.C.; Guerrero, J.M. Data-driven ship berthing fore-casting for cold ironing in maritime transportation. Appl. Energy 2022, 326, 119947. [Google Scholar] [CrossRef]
  10. Mohsenijam, A.; Lu, M. Framework for developing labour-hour prediction models from project design features: Case study in structural steel fabrication. Can. J. Civil Eng. 2019, 46, 871–880. [Google Scholar] [CrossRef]
  11. Sun, B.; Chen, H.; Wang, J. An empirical margin explanation for the effectiveness of DECORATE ensemble learning algorithm. Knowl. Based Syst. 2015, 78, 1–12. [Google Scholar] [CrossRef]
  12. Wilson, J.; Chaudhury, S.; Lall, B. Homogeneous–Heterogeneous Hybrid Ensemble for concept-drift adaptation. Neurocomputing 2023, 557, 126741. [Google Scholar] [CrossRef]
  13. Wen, S.; Zhang, C.; Lan, H.; Xu, Y.; Tang, Y.; Huang, Y. A hybrid ensemble model for interval prediction of solar power output in ship onboard power systems. IEEE Trans. Sustain. Energy 2019, 12, 14–24. [Google Scholar] [CrossRef]
  14. Wang, Z.; Liang, Z.; Zeng, R.; Yuan, H.; Srinivasan, R.S. Identifying the optimal heterogeneous ensemble learning model for building energy prediction using the exhaustive search method. Energy Build. 2023, 281, 112763. [Google Scholar] [CrossRef]
  15. Zhao, Y.; Deng, W. Prediction in traffic accident duration based on heterogeneous ensemble learning. Appl. Artif. Intell. 2022, 36, 2018643. [Google Scholar] [CrossRef]
  16. Park, U.; Kang, Y.; Lee, H.; Yun, S. A Stacking Heterogeneous Ensemble Learning Method for the Prediction of Building Construction Project Costs. Appl. Sci. 2022, 12, 9729. [Google Scholar] [CrossRef]
  17. Cheng, C.; Xu, P.F.; Cheng, H.; Ding, Y.; Zheng, J.; Ge, T.; Xu, J. Ensemble learning approach based on stacking for unmanned surface vehicle’s dynamics. Ocean Eng. 2020, 207, 107388. [Google Scholar] [CrossRef]
  18. Wang, S.; Han, D.; Hua, Y.; Wang, Y.; Wang, L.; Liu, Y. An improved selective ensemble learning approach in enabling load classification considering base classifier redundancy and class imbalance. Front. Energy Res. 2022, 10, 987982. [Google Scholar] [CrossRef]
  19. Zhou, Z.H.; Wu, J.; Tang, W. Ensembling neural networks: Many could be better than all. Artif. Intell. 2002, 137, 239–263. [Google Scholar] [CrossRef]
  20. Liu, S.; Qin, H.; Liu, G.; Xu, Y.; Zhu, X.; Qi, X. Runoff Forecasting of Machine Learning Model Based on Selective Ensemble. Water Resour. Manag. 2023, 37, 4459–4473. [Google Scholar] [CrossRef]
  21. Jin, H.; Huang, S.; Wang, B.; Chen, X.; Yang, B.; Qian, B. Soft sensor modeling for small data scenarios based on data enhancement and selective ensemble. Chem. Eng. Sci. 2023, 279, 118958. [Google Scholar] [CrossRef]
  22. Zhang, F.; Li, N.; Li, L.; Wang, S.; Du, C. A local semi-supervised ensemble learning strategy for the data-driven soft sensor of the power prediction in wind power generation. Fuel 2023, 333, 126435. [Google Scholar] [CrossRef]
  23. Jin, H.; Shi, L.; Chen, X.; Qian, B.; Yang, B.; Jin, H. Probabilistic wind power forecasting using selective ensemble of finite mixture Gaussian process regression models. Renew. Energy 2021, 174, 1–18. [Google Scholar] [CrossRef]
  24. Jafar, A.; Lee, M. High-speed hyperparameter optimization for deep ResNet models in image recognition. Clust. Comput. 2021, 26, 2605–2613. [Google Scholar] [CrossRef]
  25. Bacanin, N.; Zivkovic, M.; Antonijevic, M.; Venkatachalam, K.; Lee, J.; Nam, Y.; Marjanovic, M.; Strumberger, I.; Abouhawwash, M. Addressing feature selection and extreme learning machine tuning by diversity-oriented social network search: An application for phishing websites detection. Complex Intell. Syst. 2023, 9, 7269–7304. [Google Scholar] [CrossRef]
  26. Kurnaz, T.F.; Erden, C.; Kökçam, A.H.; Dağdeviren, U.; Demir, A.S. A hyper parameterized artificial neural network approach for prediction of the factor of safety against liquefaction. Eng. Geol. 2023, 319, 107109. [Google Scholar] [CrossRef]
  27. Atteia, G.; Samee, N.A.; El-kenawy, E.M.; Ibrahim, A. CNN-Hyperparameter Optimization for Diabetic Maculopathy Diagnosis in Optical Coherence Tomography and Fundus Retinography. Mathematics 2022, 10, 3274. [Google Scholar] [CrossRef]
  28. Ma, X.; Den, Y.; Ma, M. A novel kernel ridge grey system model with generalized Morlet wavelet and its application in forecasting natural gas production and consumption. Energy 2024, 287, 129630. [Google Scholar] [CrossRef]
  29. Han, J.; Wang, X.; Yan, L.; Dahlak, A. Modelling the performance of an SOEC by optimization of neural network with MPSO algorithm. Int. J. Hydrog. Energy 2019, 44, 27947–27957. [Google Scholar] [CrossRef]
  30. Luo, H.; Lu, J.; Huang, Z.; Yu, C.; Lu, C. Optimization strategy of power control for C+L+S band transmission using a simulated annealing algorithm. Opt. Express 2021, 30, 664. [Google Scholar] [CrossRef] [PubMed]
  31. Zhang, P.; Song, S.; Niu, S.; Zhang, R. A Hybrid Artificial Immune-Simulated Annealing Algorithm for Multiroute Job Shop Scheduling Problem with Continuous Limited Output Buffers. IEEE Trans. Cybern. 2020, 52, 12112–12125. [Google Scholar] [CrossRef] [PubMed]
  32. Dong, X.; Lin, Q.; Shen, F.; Guo, Q.; Li, Q. A novel hybrid simulated annealing algorithm for colored bottleneck traveling salesman problem. Swarm Evol. Comput. 2023, 83, 101406. [Google Scholar] [CrossRef]
Figure 1. Selective integration framework diagram.
Figure 1. Selective integration framework diagram.
Coatings 14 00318 g001
Figure 2. Flow chart of hyperparameter optimization.
Figure 2. Flow chart of hyperparameter optimization.
Coatings 14 00318 g002
Figure 3. Data grouping flowchart.
Figure 3. Data grouping flowchart.
Coatings 14 00318 g003
Figure 4. Candidate base learner integrated pruning coding structure.
Figure 4. Candidate base learner integrated pruning coding structure.
Coatings 14 00318 g004
Figure 5. Selective ensemble learning flowchart.
Figure 5. Selective ensemble learning flowchart.
Coatings 14 00318 g005
Figure 6. Correlation analysis results.
Figure 6. Correlation analysis results.
Coatings 14 00318 g006
Figure 7. R2 values of various models under data grouping technique: (a) R2 values for each base learner under the dataset SH-a; (b) R2 values for each base learner under the dataset SH-b; (c) R2 values for each base learner under the dataset SH-c; (d) R2 values for each base learner under the dataset JN-a; (e) R2 values for each base learner under the dataset JN-b; (f) R2 values for each base learner under the dataset JN-c;.
Figure 7. R2 values of various models under data grouping technique: (a) R2 values for each base learner under the dataset SH-a; (b) R2 values for each base learner under the dataset SH-b; (c) R2 values for each base learner under the dataset SH-c; (d) R2 values for each base learner under the dataset JN-a; (e) R2 values for each base learner under the dataset JN-b; (f) R2 values for each base learner under the dataset JN-c;.
Coatings 14 00318 g007
Figure 8. Average results of different models (accuracy).
Figure 8. Average results of different models (accuracy).
Coatings 14 00318 g008
Figure 9. Average results of different models (MAE).
Figure 9. Average results of different models (MAE).
Coatings 14 00318 g009
Figure 10. Average results of different models (RMAE).
Figure 10. Average results of different models (RMAE).
Coatings 14 00318 g010
Figure 11. Comparison between ISA-SE and traditional model predictions and actual values.
Figure 11. Comparison between ISA-SE and traditional model predictions and actual values.
Coatings 14 00318 g011
Figure 12. Iterative curve of fitness values.
Figure 12. Iterative curve of fitness values.
Coatings 14 00318 g012
Table 1. Combination of the prediction results of the two learners.
Table 1. Combination of the prediction results of the two learners.
h i c o r r e c t h i i n c o r r e c t
h j c o r r e c t a c
h j i n c o r r e c t b d
Table 2. Variables and classifications.
Table 2. Variables and classifications.
AttributesFactorDatatype
Human attributesAgeContinuous variable
Technical levelOrdinal categorical variable
Equipment and material attributesSteel surface cleanlinessOrdinal categorical variable
Spraying methodsNominal categorical variable
Process attributesFilm thicknessContinuous variable
Coating areaContinuous variable
Segmental structureNominal categorical variable
Environmental attributesTemperatureContinuous variable
HumidityContinuous variable
Wind forceContinuous variable
Table 3. Variable conversion table.
Table 3. Variable conversion table.
FactorFormConversion Value
Technical levelJunior worker1
Intermediate worker2
Senior worker3
Technician4
Senior technician5
Steel surface cleanlinessSa21
Sa2.52
Sa33
Segmental structureA1
B2
C3
D4
E5
F6
G7
Spraying methodsBrush coating method1
Roller coating method2
Air spray coating3
Airless spray coating4
Table 4. Eigenvalues and variance contribution rate.
Table 4. Eigenvalues and variance contribution rate.
No.EigenvalueVariance Contribution Rate/%Cumulative Variance Contribution Rate/%
114.2751.0241.02
25.823.5874.60
34.2317.3191.91
40.585.5997.50
50.431.2898.78
n00100
Table 5. Composition matrix.
Table 5. Composition matrix.
Data ItemPrincipal Component
Component 1Component 2Component 3
Age0.328−0.5970.553
Technical level0.4130.298−0.335
Film thickness0.732−0.4450.012
Coating area−0.6870.7130.225
Steel surface cleanliness−0.5660.312−0.679
Temperature0.132−0.3770.028
Spraying method0.435−0.2330.606
Segmental structure−0.3320.053−0.258
Table 6. Algorithm parameter settings.
Table 6. Algorithm parameter settings.
AlgorithmAdvantagesOptimization ParametersValue
K-nearest neighbors (KNN)RobustnessNumber of neighbors, Minkowski distance5, 2
Random forest (RF)Fast training speed, simple computationNumber of trees, minimum samples in leaf nodes100, 4
Gradient boosting decision tree (GBDT)Effectively improves accuracy and efficiency in low-dimensional dataNumber of trees, learning rate, subsampling ratio100, 0.1, 0.6
Extreme gradient boosting (XGBoost)Effectively prevents overfitting of recognition modelsStep size, maximum depth of trees0.2, 5
Support vector regression (SVR)Suitable for small sample dataPenalty factor for misclassifications, kernel function coefficient1.0, 0.001
Back propagation neural network (BPNN)Achieves nonlinear mapping of input and outputHidden layers, neurons in hidden layers1, 4
Adaptive boosting (AdaBoost)No need for feature selection, no overfittingNumber of trees, learning rate100, 0.1
Multilayer perceptron (MLP)Learn complex relationships Between features and targets through activation functions in each layerHidden layers, neurons in hidden layers1, 4
Polynomial regression (PR)Fast performance of linear methods, adaptable to a wide range of dataPolynomial degree, whether to fit intercept3, False
Elastic net regression (ENR)Handle predictions with multiple features and another feature using L1 and L2 regularization rulesPenalty coefficient, weight of L1 regularization1.0, 0.5
Table 7. Floating values of R2 for different datasets.
Table 7. Floating values of R2 for different datasets.
Data SetKNNRFGBDTXGBoostSVRBPNNAdaBoostMLPPRENR
SH0.06940.07000.04680.05000.03810.05700.07430.05000.17390.1049
JN0.07600.07470.04690.07040.09260.02430.04060.06610.18530.2066
The bold results indicate results with R2 floating values exceeding 10%.
Table 8. Experimental results of different λ values (accuracy).
Table 8. Experimental results of different λ values (accuracy).
Data Set0.30.40.50.60.7
SH-a0.9664 ± 0.02540.9715 ± 0.01590.9658 ± 0.02980.9549 ± 0.02290.9662 ± 0.1588
SH-b0.9548 ± 0.02210.9513 ± 0.02580.9712 ± 0.01250.9618 ± 0.02480.9615 ± 0.0208
SH-c0.9513 ± 0.02350.9410 ± 0.02190.9456 ± 0.03210.9480 ± 0.01110.9488 ± 0.0244
JN-a0.9468 ± 0.02350.9458 ± 0.02550.9556 ± 0.02060.9472 ± 0.02160.9321 ± 0.0199
JN-b0.9411 ± 0.03110.9488 ± 0.02140.9354 ± 0.02980.9568 ± 0.02590.9328 ± 0.0322
JN-c0.9602 ± 0.01250.9654 ± 0.01520.9771 ± 0.01030.9545 ± 0.03150.9511 ± 0.0258
The bold result is the best result.
Table 9. Experimental results of different λ values (MAE).
Table 9. Experimental results of different λ values (MAE).
Data Set0.30.40.50.60.7
SH-a0.0554 ± 0.02060.0528 ± 0.01440.0459 ± 0.02150.0435 ± 0.02210.0550 ± 0.0313
SH-b0.0630 ± 0.01760.0638 ± 0.02990.0568 ± 0.01160.0612 ± 0.02080.0456 ± 0.0213
SH-c0.0543 ± 0.01890.0502 ± 0.02220.0335 ± 0.01260.0411 ± 0.01390.0512 ± 0.0235
JN-a0.0599 ± 0.01250.0412 ± 0.02580.0549 ± 0.02960.0421 ± 0.03250.0497 ± 0.0218
JN-b0.0587 ± 0.02660.0445 ± 0.02130.0442 ± 0.02050.0376 ± 0.02260.0558 ± 0.0115
JN-c0.0602 ± 0.02510.0548 ± 0.03010.0358 ± 0.01150.0401 ± 0.01980.0523 ± 0.0322
The bold result is the best result.
Table 10. Experimental results of different λ values (RMSE).
Table 10. Experimental results of different λ values (RMSE).
Data Set0.30.40.50.60.7
SH-a0.1699 ± 0.02590.1528 ± 0.03250.1486 ± 0.03590.1615 ± 0.02330.1658 ± 0.0233
SH-b0.1805 ± 0.03120.1799 ± 0.04120.1688 ± 0.02410.1771 ± 0.01690.1671 ± 0.0219
SH-c0.1728 ± 0.02880.1626 ± 0.03150.1446 ± 0.02990.1588 ± 0.03110.1725 ± 0.0297
JN-a0.1667 ± 0.03200.1502 ± 0.02580.1559 ± 0.01150.1699 ± 0.02280.1687 ± 0.0314
JN-b0.1701 ± 0.02890.1623 ± 0.03220.1398 ± 0.02250.1554 ± 0.03390.1662 ± 0.0288
JN-c0.1669 ± 0.03250.1488 ± 0.02590.1302 ± 0.01880.1501 ± 0.02100.1612 ± 0.0315
The bold result is the best result.
Table 11. Analysis results of different models (accuracy).
Table 11. Analysis results of different models (accuracy).
Data SetKNNRFGBDTXGBoostSVRBPNNAdaBoostMLPISA-SE
SH-a0.8254 ± 0.02010.8676 ± 0.03560.9012 ± 0.01250.9456 ± 0.03120.8325 ± 0.03560.8565 ± 0.02690.8845 ± 0.02590.8256 ± 0.01150.9658 ± 0.0298
SH-b0.8133 ± 0.01890.8825 ± 0.01690.8995 ± 0.02340.9218 ± 0.02160.8526 ± 0.01260.8416 ± 0.01390.8749 ± 0.01540.8169 ± 0.02550.9712 ± 0.0125
SH-c0.8325 ± 0.02590.8766 ± 0.02310.9135 ± 0.01160.9470 ± 0.02590.8346 ± 0.02460.8359 ± 0.02160.8895 ± 0.02640.8321 ± 0.01870.9456 ± 0.0321
JN-a0.8451 ± 0.03150.8825 ± 0.04410.9256 ± 0.02890.9218 ± 0.01390.8561 ± 0.02980.8421 ± 0.01180.8754 ± 0.02250.8009 ± 0.03320.9556 ± 0.0206
JN-b0.8549 ± 0.01540.8546 ± 0.03590.9423 ± 0.03210.9321 ± 0.01160.8659 ± 0.01690.8326 ± 0.02230.8665 ± 0.03550.8116 ± 0.02350.9354 ± 0.0298
JN-c0.8388 ± 0.03150.8881 ± 0.01160.9226 ± 0.01460.9441 ± 0.01920.8456 ± 0.02590.8564 ± 0.02670.8635 ± 0.02210.7993 ± 0.02280.9771 ± 0.0103
The bold result is the best result.
Table 12. Analysis results of different models (MAE).
Table 12. Analysis results of different models (MAE).
Data SetKNNRFGBDTXGBoostSVRBPNNAdaBoostMLPISA-SE
SH-a0.1152 ± 0.03560.0985 ± 0.02240.0689 ± 0.02060.0874 ± 0.02250.1145 ± 0.03550.1356 ± 0.04880.1188 ± 0.02300.1655 ± 0.03880.0459 ± 0.0215
SH-b0.1028 ± 0.02460.1025 ± 0.01350.0833 ± 0.01280.0524 ± 0.01080.1298 ± 0.04690.1442 ± 0.02690.1256 ± 0.02240.1549 ± 0.04450.0568 ± 0.0116
SH-c0.1385 ± 0.01590.0898 ± 001160.0882 ± 0.01990.0699 ± 0.03010.1144 ± 0.01960.1334 ± 0.02250.1298 ± 0.03010.1502 ± 0.03350.0335 ± 0.0126
JN-a0.1298 ± 0.01140.0995 ± 0.01120.0501 ± 0.02160.0888 ± 0.01600.1266 ± 0.03250.1528 ± 0.03210.1498 ± 0.02200.1469 ± 0.02770.0549 ± 0.0296
JN-b0.1423 ± 0.01590.0943 ± 0.02260.0635 ± 0.01980.0774 ± 0.02030.1229 ± 0.01550.1356 ± 0.02990.1359 ± 0.01150.1528 ± 0.03760.0442 ± 0.0205
JN-c0.1388 ± 0.02050.1028 ± 0.01770.0826 ± 0.02300.0806 ± 0.02590.1180 ± 0.02200.1440 ± 0.02210.1302 ± 0.01160.1550 ± 0.02760.0358 ± 0.0115
The bold result is the best result.
Table 13. Analysis results of different models (RMSE).
Table 13. Analysis results of different models (RMSE).
Data SetKNNRFGBDTXGBoostSVRBPNNAdaBoostMLPISA-SE
SH-a0.2031 ± 0.02250.1835 ± 0.02580.1602 ± 0.02350.1688 ± 0.02030.2133 ± 0.02440.2294 ± 0.01960.2366 ± 0.02180.2859 ± 0.03500.1486 ± 0.0359
SH-b0.2155 ± 0.01970.1952 ± 0.01150.1590 ± 0.02550.1720 ± 0.01160.2298 ± 0.02030.2046 ± 0.02880.2259 ± 0.03250.2695 ± 0.01560.1688 ± 0.0241
SH-c0.2106 ± 0.02060.1820 ± 0.02310.1552 ± 0.01690.1602 ± 0.02130.2259 ± 0.03040.2315 ± 0.04250.2461 ± 0.02290.2777 ± 0.02260.1446 ± 0.0299
JN-a0.1995 ± 0.02660.1520 ± 0.01290.1620 ± 0.02310.1743 ± 0.01960.2006 ± 0.01460.2589 ± 0.02560.2650 ± 0.02240.2889 ± 0.03280.1559 ± 0.0115
JN-b0.2106 ± 0.03210.1706 ± 0.01550.1568 ± 0.03150.1622 ± 0.01990.2115 ± 0.02870.2459 ± 0.03210.2699 ± 0.03050.2961 ± 0.02220.1398 ± 0.0225
JN-c0.2256 ± 0.02880.1809 ± 0.02350.1659 ± 0.02210.1658 ± 0.01050.2299 ± 0.05640.2460 ± 0.02550.2587 ± 0.03250.2826 ± 0.03410.1302 ± 0.0188
The bold result is the best result.
Table 14. Result analysis of two models under three evaluation indicators: Acc, MAE, and RMSE.
Table 14. Result analysis of two models under three evaluation indicators: Acc, MAE, and RMSE.
Data SetISA-SETraditional Model
AccuracyMAERMSEAccuracyMAERMSE
SH set0.9609 ± 0.02480.0454 ± 0.01520.1540 ± 0.02990.9125 ± 0.03880.0856 ± 0.02250.2035 ± 0.0215
JN set0.9560 ± 0.02020.0450 ± 0.02050.1420 ± 0.01760.9028 ± 0.04560.0956 ± 0.03120.1889 ± 0.0258
Table 15. Result analysis of two models under three evaluation indicators: Iteration Count, Number of Base Learners, and Time.
Table 15. Result analysis of two models under three evaluation indicators: Iteration Count, Number of Base Learners, and Time.
Data SetISA-SETraditional Model
Iteration CountNumber of Base LearnersTime (s)Iteration CountNumber of Base LearnersTime (s)
SH set5046262470
JN set4244852556
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bu, H.; Ge, Z.; Zhu, X.; Yang, T.; Zhou, H. Prediction of Ship Painting Man-Hours Based on Selective Ensemble Learning. Coatings 2024, 14, 318. https://doi.org/10.3390/coatings14030318

AMA Style

Bu H, Ge Z, Zhu X, Yang T, Zhou H. Prediction of Ship Painting Man-Hours Based on Selective Ensemble Learning. Coatings. 2024; 14(3):318. https://doi.org/10.3390/coatings14030318

Chicago/Turabian Style

Bu, Henan, Zikang Ge, Xianpeng Zhu, Teng Yang, and Honggen Zhou. 2024. "Prediction of Ship Painting Man-Hours Based on Selective Ensemble Learning" Coatings 14, no. 3: 318. https://doi.org/10.3390/coatings14030318

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop