3.3.3. Defect Prediction Model for Titanium Alloy Pump Body Casting Based on PSO-BP Neural Network
The BP neural network is a multi-layer feedforward network that employs error backpropagation. It is widely used in practical applications due to its simple structure and strong operability [
25], making it particularly suitable for modeling the complex relationship between investment casting process parameters and casting defects.
However, for the small-sample, high-nonlinearity casting defect prediction scenario in this study, the standard BP neural network has prominent limitations: it is extremely sensitive to initial parameters such as weights and thresholds. After the network structure is determined, randomly initializing weights and thresholds for model training will cause the network to easily fall into local optima during the gradient descent process, resulting in large prediction deviations and poor generalization performance on unseen data. To address this targeted pain point, the PSO algorithm with excellent global search capability is introduced to adjust and optimize the initial weights and thresholds of the BP neural network before formal training. Through the global iterative search of the particle swarm, the optimal initial parameter combination that minimizes the network prediction error is obtained, which fundamentally mitigates the randomness of initial parameter setting in the traditional BP model, and significantly improves the learning efficiency, prediction precision and generalization stability of the network in this defect prediction task [
26,
27].
In this study, three key investment casting process parameters (pouring temperature, pouring time, and shell preheating temperature) that have a significant impact on casting defects are determined as the input variables of the network, while the two core defect quality indicators (total casting defect volume and shrinkage porosity/void volume) are set as the output variables of two independent prediction models, respectively. Correspondingly, the input layer of each BP neural network is set to 3 nodes matching the number of input process parameters, and the output layer is set to 1 node corresponding to a single defect indicator. For the training algorithm of the network, the Levenberg–Marquardt (LM) algorithm is selected: on the one hand, it has the advantages of fast convergence speed and high solution precision for medium-scale nonlinear regression tasks matching this study; on the other hand, it can be well compatible with the initial weights and thresholds optimized by the PSO algorithm, ensuring the stability and efficiency of the subsequent network training process.
The Inputs of the paper are 3 key parameters (pouring temperature, pouring time, and shell preheating temperature), and a combination of “L25(53) orthogonal experiment + 97 sets of random samples” is adopted. The orthogonal experiment (25 sets) has uniformly covered 5 levels of the 3 parameters, ensuring the representativeness of the parameter space (without missing key combinations). The 97 sets of random samples fill the “parameter gaps” not covered by the orthogonal experiment, enabling the dataset to cover the complete parameter domain.
Regarding the activation functions, as shown in
Figure 7, the hyperbolic tangent sigmoid function (tansig) is used for the hidden layer, and a linear function (purelin) is adopted for the output layer, which is appropriate for regression problems.
The essence of neural network training lies in continuously adjusting the weights and thresholds to gradually reduce the network’s output error [
28]. In this study, the Particle Swarm Optimization (PSO) algorithm is employed to optimize the initial weights and thresholds of the BP neural network. This approach aims to identify the optimal initial parameters, thereby mitigating the local optima problem often encountered in traditional BP networks due to random parameter initialization.
To accurately match the requirements of the defect prediction scenario for titanium alloy pump body castings, it is necessary to determine the parameter range and implementation logic of sensitivity analysis around the nonlinear mapping characteristics of “process parameters–defect volume”. Through quantitative analysis, optimal parameter combinations are screened to ultimately ensure the prediction precision and robustness of the model in actual production scenarios.
3.3.4. Single-Factor Sensitivity Analysis of Model Hyperparameters
Combined with the actual needs of defect prediction for titanium alloy pump body castings, the influences of various parameters on model performance exhibit significant hierarchical differences, as shown in
Figure 8:
Based on the sensitivity coefficient classification criteria, the model’s sensitive parameters can be divided into three levels: high, medium, and low. Among them, the high-sensitive parameter is the PSO population size (0.2077) with a sensitivity coefficient > 0.15, which has the most significant impact on the model’s prediction performance. A slight change in this parameter can trigger drastic fluctuations in the model’s fitting degree and prediction performance. When the PSO population size increases from 10 to 30, the coefficient of determination (R2) of the test set plummets from 0.8321 to 0.6671, and the root mean square error (RMSE) fluctuates slightly from 0.5868 to 0.5612. During this stage, the excessive increase in population size leads to redundant particle optimization and premature algorithm convergence, resulting in a significant decline in the model’s fitting precision for the input-output mapping relationship and falling into the local optimal trap. However, when the population size increases from 30 to 40, the R2 rebounds strongly to 0.8291, and the RMSE drops to the lowest level of 0.5446 in the entire interval, with optimization precision and fitting effect reaching their peaks simultaneously. When the population size exceeds 40, the R2 falls back to 0.7860, the model performance continues to deteriorate, and the computational load increases significantly. Considering the fitting effect, optimization precision, and computational efficiency comprehensively, a population size of 40 is determined as the optimal value to achieve “sufficient global optimization–controllable computational cost–avoidance of local optima”.
The medium-sensitive parameters are the number of hidden neurons (0.1328) and the BP learning rate (0.1011) with sensitivity coefficients between 0.10 and 0.15. The rationality of their values directly affects the neural network’s fitting ability, convergence stability, and predictive power. Changes in these parameters have a noticeable impact on model performance but no risk of drastic failure.
When the number of neurons increases from 3 to 5, the coefficient of determination (R2) of the test set rises from 0.7872 to 0.8309, and the root mean square error (RMSE) decreases from 0.6608 to 0.5699. The increase in the number of neurons effectively expands the nonlinear mapping capability of the network and significantly improves the fitting precision of the model. When the number of neurons increases from 5 to 8, the R2 drops to the lowest value of 0.7409 in the entire interval, and the imbalanced network structure leads to a phased deterioration of the fitting ability. When the number of neurons continues to increase from 8 to 12, the R2 climbs to the maximum value of 0.8393, and the RMSE drops to the lowest value of 0.5168, with the fitting and prediction performance of the model reaching the optimal level. Beyond 12, the increase in R2 is less than 0.005, the decrease in RMSE tends to level off, the improvement in model performance saturates, and there is a risk of redundant overfitting. Considering the nonlinear fitting ability, anti-overfitting characteristics and computational efficiency comprehensively: although increasing the number of neurons from 5 to 12 can slightly improve the fitting precision (R2 only increases by 0.0084), it will significantly increase the number of network parameters and training time (doubling the number of neurons leads to an increase in computational load of about 60%). When the number of neurons is 5, it can already meet the fitting precision required for engineering applications (R2 = 0.8309, RMSE = 0.5699), with optimal computational efficiency and no risk of overfitting. Therefore, the optimal number of hidden neurons is determined to be 5.
For the BP learning rate: when the learning rate increases from 0.005 to 0.050, the test set R2 continues to plummet from 0.8576 to 0.7738, and the RMSE rises from 0.5154 to 0.5816. A medium learning rate causes imbalanced network gradient updates, resulting in prediction oscillations in critical intervals and a significant decline in convergence stability. When the learning rate adjusts back from 0.050 to 0.100, the R2 rebounds strongly to 0.8571, returning to the optimal level. When the learning rate is 0.005, the R2 reaches the highest value among all parameters, and the RMSE is the lowest in the entire interval, with the model converging stably without oscillations and achieving the optimal prediction precision. Therefore, the optimal BP learning rate is determined to be 0.005.
The low-sensitive parameters are the PSO learning factors (0.0507) and PSO number of iterations (0.0219) with sensitivity coefficients < 0.10. Different values of these parameters have a weak impact on model performance. Under parameter fluctuations, the model’s fitting degree and error remain highly stable, with limited adjustable space and optimization potential.
The optimization results of the PSO number of iterations show that when the number of iterations increases from 50 to 100, the test set R2 rises slightly from 0.7842 to 0.7906, and the RMSE decreases from 0.6565 to 0.5488. The increase in the number of iterations enables the particle swarm to fully converge, providing better initial weights for the BP network. When the number of iterations is in the range of 100~150, the R2 and RMSE hardly change, the curve tends to level off, the fitness converges completely, and the performance reaches a saturated and stable state. When the number of iterations exceeds 150 and increases to 200, the R2 decreases slightly to 0.7733, and the RMSE rises back to 0.5795. Excessive iterations cause redundant oscillations of the algorithm, resulting in slight performance degradation and waste of computational resources. Therefore, 100 iterations are selected as the optimal PSO number of iterations.
Based on the studies by Clerc M, Wang Dongfeng et al. [
29,
30], four different combinations of particle swarm learning factors were set up for testing. The test results show that the fluctuation range of the R
2 value on the test set corresponding to different parameter combinations is only 0.0420, indicating negligible differences in model performance. Among them, the balanced cognitive and social learning factor combination [2.05, 2.05] achieves the best performance, with a corresponding R
2 of 0.8289 and a Root Mean Square Error (RMSE) as low as 0.5163. The combination [1.50, 2.50] focusing on social learning and the combination [2.50, 1.50] focusing on individual cognitive learning achieve R
2 values of 0.8217 and 0.8125, respectively, with performance close to that of the optimal combination. The [3.0, 3.0] combination with excessively large dual learning factors yields the worst performance, with its corresponding R
2 dropping to 0.7869. The balanced combination can reconcile the global exploration and local exploitation capabilities of the algorithm, thus achieving optimal stability.