4.2. Training Machine Learning Models
Numerical training experiments started with the pressure probe footprints generated by using mould filling simulations. A set of 3000 images of pressure footprints was used in the CNN model with the purpose of predicting the five variables of position, , size and the relative permeability . Both datasets, pressure probe footprints, as well as the the five variables were already normalized in the interval so no extra treatments were required.
The first step was to split randomly the dataset generated into two pieces usually known as training and test sets. Training and test sets contained all the data in a ratio of 80/20. The test set was qualified as a never-before-seen dataset with the intention to evaluate the model under new data not used during training.
The subsequent step deals with training of the CNN model by using Keras. The network described in the previous section was coded in Keras by a sequential linking of two convolutional layers and three dense layers (see Figure 6
for more details). Each layer applies some tensor operations with the input data, and these operations make use of weight and bias factors. The weight and bias factors are the intrinsic attributes of the different layers used and are considered the parameters where the learning capacity of the network resides. A total of 860,517 parameters was used in the CNN model, Table 1
. The determination of the network parameters is carried out by minimization of a norm defined as the sum of the squared differences between the truth values of the variables (
) and the predicted ones by using the CNN network. This Mean Squared Error (MSE) is used in this work as loss function to minimize (
the size of the dataset). The iterative method for minimizing the loss function in combination with a gradient descent called Adadelta
were used to this end. The exact rules governing a specific use of gradient descent are defined by the Adadelta
] Keras optimizer. Training was carried out after not more than 5000 epochs by using 64 as the batch size and lasted around 16 h with a 10 cores Intel Xeon W-2155CPU-3.30 GHz machine. The evolution of training and test losses against the number of epoch training cycles is presented in Figure 7
. The best model configuration obtained produces a minimum MSE of
after training which was judged accurately enough for modelling purposes.
It is worth highlighting the similarity of MSE loss curves obtained for both training and set datasets which is an indicator of reasonable model performance for unseen data. Highly dissimilar behaviour of these two curves usually indicates overfitting, which is a common problem in machine learning. If the complexity of the network and number of network parameters is too high, not in correspondence with the datasets size, overfitting is produced. In this case, the accuracy obtained after training can be excellent but the error corresponding to the test dataset could be still unacceptable and indicates a deficient model generalization for new unseen data. Several strategies were implemented in this work to alleviate possible overfitting problems according to recommendations found in the literature, namely data-augmentation, regularization and dropout rate in fully-connected layers.
An augmented dataset is generated from the pressure sensor signal by adding a white noise to each image of the training set. The white noise generated follows a normal distribution
with zero mean and 0.001 standard deviation. The augmented dataset contains then a total of 14,400 images, with 2400 being from the original set computed with OpenFoam and the remaining 12,000 the augmented one.
regularization techniques add a constraining term to the MSE loss function which is proportional, with a regularization factor of
, to the total sum of the squared values of the parameters of the network (
). Thus, the presence of data outliers is penalized preventing possible overfitting. Lastly, dropout rates were applied in the fully-connected neuron layers, entailing random dropping out, setting to zero, a number of output features of the layer during training producing a less regular structure. The loss curves that correspond to the case without any of the strategies used to alleviate overfitting are also presented in Figure 7
. Although the training loss in this case was excellent, and close to
, the differences with the test loss were unacceptable. Thus, the model in this case was unable to generalize with the same precision level with new unseen data.
The comparisons between the ground truth values of the variables,
, and the predicted ones through use of the CNN are gathered in Figure 8
. The figure includes both training and test datasets. As a first approximation, the correlation between predicted and ground truth values was fairly good. This was especially true for the position of the dissimilar material region
, Figure 8
a,b. The network in this situation was able to learn in a highly efficient manner from the given footprint by using only the features associated with the rise up of the pressure signals. However, the accuracy attained for the remaining variables was, in general, more modest although the overall trends were perfectly captured, Figure 8
c,d,f. A plausible explanation for this accuracy reduction could be attributed to the similarity of the pressure fields generated by the presence of the dissimilar material region. Two regions defined with similar values of size and/or relative permeability produce very close fluid pressure fields almost indistinguishable, producing nearly no single-valued pressure footprints. This reduction of accuracy was more evident in the case of the relative permeability parameter
which is essentially controlled by the pressure gradients. Figure 8
e shows the previous statement. Pressure field differences for two small values of
may differ only slightly when the macroscopic flow reached the outlet gates, thus producing again almost equal pressure footprints. Nonetheless, the accuracy was judged to be reasonable for the automatic detection of the position and severity of the dissimilar material region.
The histograms of the individual absolute errors computed as
corresponding to the five variables are also presented in Figure 8
f. As mentioned previously, the prediction of the two position variables (
) was excellent and the error in this case exhibits a Dirac-like type function with
of the data lying within an absolute error band of less than
. It should be pointed out that the model variables were expressed in non-dimensional and normalized form and thus, the absolute errors were expressed in percentage. The error distribution for the remaining variables was, of course, more flatten and the plausible reasons were discussed previously. The fraction of the total data corresponding to predictions with absolute error lying in the
error band are presented in Table 2
for the sake of completion.
a is presented to illustrate the overall performance of the model. The plots contain some dissimilar material regions selected randomly together with the corresponding predictions by using the test dataset and the
sensor network size. As discussed previously, the accuracy of the predictions was fairly good, thus showing the ability of the proposed model to capture the presence of dry regions during liquid moulding.
The accuracy of the model was also addressed for some additional cases including different pressure network sizes of
corresponding to 4, 9 and 16 equally spaced pressure sensors. It should be noted that as OpenFoam simulations were run a single time, saving the pressure probe evolution at the locations corresponding to each specified network, there was no need for further recalculations. The three models were trained by using the same procedure previously explained and the corresponding MSE losses obtained for the
networks were 0.016, 0.011 and 0.012, respectively. The MSE losses obtained for the
networks were very similar between them. Such results seem to indicate that the dissimilar material region size used in this study, following the uniform distribution
, is perfectly captured even with the
network. Increasing the number of sensors to
will not result in a better accuracy of the model for such dissimilar material region size. Accordingly, the sensor network size should be previously determined if a minimum dissimilar material size is sought. The predictions for the ground truth cases presented in Figure 9
a by using the
network are now summarized in Figure 9
b for the sake of completion.
Lastly, the flow progress predictions for the case presented in Figure 3
are now shown in Figure 10
. This case corresponded to a
square region centred in
and relative permeability of
. The pressure footprint presented in Figure 4
a was used as input to predict the position, size and relative permeability yielding the 5-tuple
from the ground truth values of
. OpenFoam simulations were run subsequently and the corresponding flow patterns gathered in Figure 10
. The agreement between the ground truth flow patterns shown in Figure 3
and the predicted ones was excellent,
MSE, considering that the only information used comes from a discrete network of pressure sensors.