3. ANN Core Loss Prediction Model
This paper adopts the multi-layer perceptron (MLP)-type Artificial Neural Network algorithm, whose structure is shown in
Figure 4. It consists of an input layer, two hidden layers, and an output layer [
16]. The input layer contains three features: temperature, frequency, and magnetic flux density. The output layer has only one unique output variable, which is core loss; therefore, the output layer has only one neuron.
The core formula system of ANN revolves around three major links: forward propagation (calculating predicted values), loss function (measuring prediction errors), and backpropagation (updating parameters). Features need to be standardized before input:
where
X is the input feature matrix,
m0 is the standardized output feature,
μ is the feature mean, and
σ is the feature standard deviation. The symbols that appear subsequently are shown in
Table 2.
Among them, l = 1, 2, and 3 correspond to the connections from the input layer to Hidden Layer 1, from Hidden Layer 1 to Hidden Layer 2, and from Hidden Layer 2 to the output layer, respectively.
Forward propagation is the process in which data flows from the input layer to the output layer, and feature mapping is achieved through linear transformation and nonlinear activation. The calculation from the standardized output features to the hidden layer is as follows:
The calculation from the first hidden layer to the second hidden layer is a key step in the transition of features from preliminary extraction to in-depth abstraction, and the specific calculation is as follows:
The output
m2 of the second hidden layer flows into the output layer, where these abstract features are converted into specific predicted values. The calculation of the output layer is as follows:
where
is the predicted value on a logarithmic scale. To restore the real scale, it is necessary to convert the predicted value on the logarithmic scale to the real scale, and the calculation is as follows:
where
is the predicted value on the real scale.
The loss function can measure the difference between the model’s predicted values and the true values, and it provides a clear optimization objective (minimizing the loss) for parameter updates. The loss functions used in the ANN include Mean Squared Error (MSE) and regularization, and the specific loss function formulas are as follows:
The first half of Equation (7) is the Mean Squared Error (MSE), and the second half is the regularization term. represents the model’s predicted value of the real scale for the i th sample, and yi represents the true value of the i th sample. denotes the squared norm of the weight matrix, and ξ is the regularization coefficient.
Parameter update is the specific learning process of the ANN. Based on the gradient of the loss function, it adjusts weights and biases to reduce the loss [
19]. The core formula for parameter update is as follows:
where
θ generally refers to the parameters in the model, such as
W1,
b1,
W2, and
b2;
t refers to the number of iterations; and
η refers to the learning rate, which is used to control the step size. If it is too large, it is easy to miss the minimum loss point; if it is too small, it will lead to slow convergence. ∇
θtLoss is the gradient of the loss function with respect to parameter
θt, which determines the direction and magnitude of adjusting weights and biases. The gradient of the ANN is obtained by the backpropagation (BP) algorithm, which is calculated backward from the output layer to the input layer using the chain rule.
The termination condition for the gradient descent algorithm in this paper is that the loss function does not decrease for 20 consecutive iterations. Meanwhile, to prevent infinite training, the maximum number of iterations is set to 500. The hyperparameters of the ANN in this paper—including the optimizer, learning rate, batch size, and neuron configuration—are determined through an empirical trial-and-error method. Specifically, while keeping other hyperparameters unchanged, each hyperparameter is adjusted individually, and multiple candidate values are tested to identify the configuration that achieves the lowest validation loss and stable convergence. The Adam optimizer is selected because it exhibits faster and more stable convergence compared with SGD and RMSProp. The learning rate is tested within the range of 0.001–0.01, and 0.005 is finally chosen. The batch size is adjusted among 16, 32, and 64, with 32 selected in the end. For the neuron configuration, multiple combinations are evaluated; the results show that the network with two hidden layers (containing 26 and 13 neurons, respectively) achieves the lowest validation error without overfitting.
Three evaluation metrics are used to evaluate the ANN model, which are the Mean Absolute Percentage Error (MAPE), Coefficient of Determination (R2), and an additional metric named prediction accuracy (PA).
Mean Absolute Percentage Error (MAPE) represents the average of the ratios of the errors between predicted values and true values to true values, and can reflect the magnitude of the relative error of the prediction results.
where
is the predicted value,
yi is the true value, and
n is the total number of predictions. The larger the MAPE, the greater the difference between the predicted values and the actual values, and the lower the accuracy.
R2 measures the degree of fit of the regression model to the data, that is, the proportion of the variation in the target variable that the model can explain.
where
is the mean value of the true values. The closer
R2 is to 1, the better the fitting effect of the model.
Prediction accuracy (PA) represents the ratio of the number of predicted values falling within the corresponding interval to the total number of predictions when constructing an error interval centered on each true value, which allows for an intuitive understanding of the model’s prediction performance under certain accuracy requirements. Among them, the error interval = [true value × 90%, true value × 110%].
where
k is the number of predicted values falling within the corresponding interval. The higher the PA, the more predicted values are concentrated around the true values.
4. Results Comparison
4.1. Comparison with Other Models
The data of the four materials are divided into the training set, validation set, and test set, with division rates of 70%, 15%, and 15%, respectively. The training set is the main dataset used for the model to learn data patterns; the validation set is mainly used to evaluate the model’s performance during the training process and adjust the model’s hyperparameters accordingly; the test set is the dataset used to finally evaluate the model’s generalization ability after the model training and optimization are completed.
It should be noted that although traditional k-fold cross-validation was not performed within a single dataset, the proposed Artificial Neural Network (ANN) model underwent independent training and testing on four different magnetic materials separately. This design provides a form of material-based external validation, which can effectively evaluate the model’s robustness across different material domains. The predictive performance (MAPE, R2, and PA) observed among four materials further confirms the reliability of the adopted validation strategy.
First, the accuracy of the ANN core loss prediction model with logarithmic transformation of the research data and without logarithmic transformation of the data is studied. The prediction results of core loss of the four materials are shown in
Table 3.
Table 3 shows that for the ANN model using logarithmic data processing, the maximum PA and minimum MAPE are 93.33% and 4.56%, respectively; for the ANN model without logarithmic transformation processing, the maximum PA and minimum MAPE are 74.44% and 8.11%, respectively. For the four materials, the PA of the ANN model with logarithmic processing is higher than that of the model without logarithmic processing, and MAPE is also smaller than that of the model without logarithmic processing. Logarithmic transformation of the data can improve the accuracy of the ANN model.
To further verify the prediction effect of the ANN core loss prediction model, the Steinmetz Equation (SE) model and K-nearest neighbor (KNN) algorithm model are used to predict core loss for comparison, with the main evaluation indicators for comparison being MAPE, R2, and PA.
The SE model is a classic core loss prediction model. Under sinusoidal wave excitation, the core loss calculation formula of SE model is
where
is the core loss;
f is the frequency;
Bm is the peak value of magnetic flux density; and
k1,
α1, and
β1 are the coefficients fitted from the experimental data—generally, 1 <
α1 < 3, 2 <
β1 < 3. The formula indicates that the core loss per unit volume (core loss density)
P depends on the power functions of frequency
f and peak magnetic flux density
Bm. The empirical coefficients
k1,
α1, and
β1 in the SE model are generally fitted using experimental data. For the four materials, the fitting results are shown in
Table 4.
KNN is an instance-based regression model. When predicting new input features, KNN does not need to construct an explicit model; instead, it finds the K-nearest neighbors to the input features and uses the weighted average of their target values as the prediction result. KNN regression can adapt to local feature changes but is sensitive to high-dimensional data. KNN predicts results through the weighted average, where the weight can be defined as the reciprocal of the distance between samples, calculated as follows:
In core loss prediction, xi are the K input features from the training set that are closest to the new input feature x, yi are the core losses corresponding to these input features, is the weight corresponding to yi (the greater the distance, the greater the weight), and is the weighted average of losses based on the nearest neighbor features.
For the four types of materials, the calculation results of three models (ANN, KNN, and SE) are shown in
Figure 5 and
Table 5.
It can be seen from the core loss graphs of the four materials in
Figure 5 that the prediction results of each model can fit well the variation trend of the true value of core loss, among which the predicted values of the ANN model are the closest to the true value of core loss. From the model scatter plot, it can be observed that the prediction results of the three models are all around the ideal fitting line, and the prediction results of the ANN model are relatively closer to the ideal fitting line. This indicates that among the three models, the ANN model has relatively better prediction performance.
Table 5 shows that, for Material 1, the ANN model achieves the highest PA of 96.27%, while the SE model has the lowest PA at only 15.26%. The
R2 values of all models are above 0.9, which can capture the overall variation trend of the data well. However, only the MAPE of the ANN model is within 5%. For Material 2, the accuracy of the KNN model is significantly improved compared with that for Material 1, with a PA of 83.64% and MAPE reduced to 6.62%. Although the prediction accuracy of the SE model is improved, it is still lower than those of the other two models. The MAPE,
R2, and PA of the ANN model are 4.56%, 0.9791, and 93.33%, respectively, which are better than those of the KNN and SE models. For Material 3, the ANN model still performs the best among the three models in terms of prediction accuracy and data fitting degree, with a PA of 93.42% and MAPE of 4.28%. For Material 4, the accuracy of the KNN model decreases compared with those for Material 2 and Material 3, with a PA of 43.94% and MAPE of 15.65%. In contrast, the accuracy of the ANN model is improved compared with those for the previous three materials, achieving a PA of 98.48%,
R2 of 0.9988, and MAPE of only 2.58%. In addition, the KNN model exhibits better MAPE performance for Material 2 compared with the other materials. As shown in
Figure 5a,c,e,f, the data distribution of Material 2 is more uniform, with significantly less clustering imbalance among the sample points than in the other materials. The denser and smoother distribution of Material 2 allows the distance-based local interpolation of the KNN model to perform more effectively. In contrast, for the other three materials with less uniform data distributions, the distance-weighted interpolation of KNN cannot adequately capture the coupled nonlinear relationships among features, resulting in decreased prediction accuracy.
4.2. Interpretability Analysis
To conduct the interpretability analysis of the model, take Material 1 as an example. The SHapley Additive exPlanations (SHAP) method was adopted to explore the contribution mechanism of input features to the core loss prediction results. First, background data was randomly sampled from the standardized training set; subsequently, an explainer was initialized based on KernelExplainer, which is compatible with the MLP regression model under the scikit-learn framework. With the model’s prediction function and background data as inputs, the explainer completes the interpretability encapsulation of the model’s prediction logic; then, this explainer was used to calculate SHAP values for the features of the standardized test set. The obtained SHAP values represent the contribution degree of different feature dimensions of each sample to the model output, and the correlation patterns between features and core loss prediction results were systematically presented from three dimensions: global feature importance, single-feature impact trend, and single-sample local interpretation. The global parameter importance for Material 1 is shown in
Figure 6.
Figure 6 shows the impact of each feature on the output of the core loss prediction model. The features are ranked in descending order of their mean absolute SHAP values as follows: peak magnetic flux density (
Bm), frequency (
F), and temperature (
T). This indicates that the peak magnetic flux density has the most significant impact on the model’s predictions, followed by frequency, while temperature has the relatively weakest impact.
In
Figure 6, red represents feature values with high magnitudes, and blue represents those with low magnitudes. From the correlation between feature values and SHAP values, high magnitudes of peak magnetic flux density and frequency (the red regions) correspond to positive SHAP values, which promote an increase in the predicted core loss; low magnitudes (the blue regions) correspond to negative SHAP values, which inhibit an increase in the predicted core loss.
In contrast, SHAP values corresponding to high and low temperature values are scattered across both positive and negative ranges, indicating that the pattern of the impact of temperature on core loss is complex. However, it can be clearly observed that high temperature values correspond to negative SHAP values (inhibiting an increase in the predicted core loss), while low temperature values correspond to positive SHAP values (promoting an increase in the predicted core loss).
4.3. Result Analysis
To gain a more intuitive understanding of the prediction error distribution and to evaluate the robustness and reliability of the models, a residual histogram of the ANN model’s predictions was plotted to illustrate how the residuals are distributed relative to the zero-error line in
Figure 7. In addition,
Table 6 summarizes and compares the prediction bias of each model across four materials to analyze their respective systematic deviations.
In
Figure 7, the residuals of the ANN model’s predictions for four materials are concentrated near the zero-error line and exhibit an approximately symmetric distribution, indicating that the ANN model has no significant systematic bias. The narrow and sharp shapes of the histograms suggest small residual variances and stable prediction performance. Among them, Material 4 shows the most concentrated residual distribution, implying the highest prediction consistency, while Materials 1–3 present slightly wider residual distributions, which can be attributed to the greater data dispersion of these materials.
As shown in
Table 6, the ANN model exhibits relatively small mean residuals for all four materials (within ±1300 W/m
3), indicating that its predictions have no significant systematic bias. In contrast, the SE model shows large fluctuations in mean residuals among different materials, with a clear overestimation for Material 3 and an underestimation trend for the others. The KNN model, on the other hand, presents positive residuals across all materials, suggesting an overall overestimation tendency.
A comprehensive analysis of the prediction results for the three models across four materials indicates that the ANN model performs best in terms of prediction accuracy and bias control. Within the ±10% error range of the true values in the test set, the minimum PA of the ANN model is 93.33%, with an average of 95.38%, whereas the average PA values of KNN and SE models are 62.07% and 19.83%, respectively. From a statistical perspective, the ANN model not only maintains small mean residuals but also demonstrates good stability across different materials. Its error distribution is approximately symmetric and centered around zero, suggesting that the combination of logarithmic transformation and nonlinear fitting effectively reduces heteroscedasticity and enhances the generalization stability of the model.
In addition, it is worth noting from
Table 6 that the KNN model shows a smaller mean residual than the ANN model does for Material 3, but its PA value is lower. This occurs because the mean residual may approach zero when positive and negative errors offset each other, and thus a smaller mean residual does not necessarily imply higher prediction accuracy. To comprehensively evaluate the predictive performance of a model, multiple indicators should be considered together rather than relying solely on the mean residual.
5. Conclusions
Based on Problem C of the 2024 China Postgraduate Mathematical Contest in Modeling, this paper studies the core loss under sinusoidal excitation. Based on the ANN method, a core loss prediction model is built. By performing logarithmic transformation on the core loss data of the training set input to an ANN, the prediction effect of core loss under high-frequency sinusoidal excitation is improved.
A comparison between the ANN model with logarithmic transformation of the data and the ANN model without logarithmic transformation shows that, after logarithmic transformation of core loss, PA increases from 57.35% to 96.27%, and MAPE decreases from 14.27% to 3.86%. Further comparison results of the three prediction models (ANN, KNN, and SE) for four materials reveals that the prediction accuracy of the ANN is higher than those of the other two models. Within the error range of ±10% of the true values in the test set, the average PA of ANN’s prediction results for the four materials can reach 95.38%, the average R2 can reach 0.9873, the highest MAPE is 4.56%, and the lowest MAPE is only 2.58%. This confirms the better performance of the ANN model with the logarithmic transformation of input data in core loss prediction.
The ANN model proposed in this paper can complete training in approximately 1.49 s under an ordinary computer environment (Intel i9 CPU, 32 GB memory), with a peak memory usage of about 0.19 MB—indicating that the model has a low computational burden. This provides a practical data-driven method for predicting core loss. In practical applications, the trained model can be integrated into the core design process as a fast surrogate model to assist in material selection and performance evaluation, or combined with finite element simulation to reduce the computational cost in the design optimization process.
In summary, the currently used dataset only includes three parameters: magnetic flux density waveform, frequency, and temperature. Future work will focus on expanding the dataset to include a broader range of materials and excitation conditions, testing the model under non-sinusoidal waveforms, and exploring transfer learning techniques to improve adaptability across different magnetic materials. These efforts will further enhance the generalization and practical applicability of the proposed approach.