Two numerical case studies were conducted (distinct from the three flow cases discussed earlier). The first case study demonstrates the importance of specifying a convergence criterion, enabling the model to interpret the user’s definition of grid convergence. This was achieved by applying the similarity model to the geometry test cases without and with a user-defined convergence specification, and subsequently comparing the results. The second case study investigates feature selection for the model by evaluating its performance using different sets of flow features as inputs. The model was applied with various feature combinations, and its outputs were assessed against the mesh convergence results obtained earlier, as well as through visual inspection of scalar field slices.
5.2. Case Study 1: Feature Convergence Limit Specification
The first numerical investigation aimed to determine whether the machine-learning (ML) framework requires explicit user-defined thresholds to identify when a flow feature should be considered numerically significant. A central objective of the proposed ML methodology is to minimize user intervention. Reducing the need for expert input improves model generalizability, decreases susceptibility to user-induced bias, and allows the framework to adapt more robustly to new flow configurations. During evaluation of the model using the bump test case, however, it became evident that some degree of user-defined guidance was necessary. In particular, anomalously low similarity scores were observed in regions of the freestream that are physically unaffected by grid refinement. As shown in
Figure 17, these low-similarity zones appear far from the bump, in areas where the flow field should remain essentially invariant to mesh perturbations.Based on physical intuition and standard grid-sensitivity arguments, the solution in these regions should remain nearly unchanged with grid refinement. Since there are no strong gradients or localized features, the similarity scores should be close to one. Further examination revealed that the ML model exhibited excessive sensitivity to small-magnitude variations in the feature fields. This scale sensitivity led the model to assign disproportionate importance to numerically insignificant fluctuations, thereby producing artificially low similarity values in otherwise uniform regions. These findings indicate that, without appropriate magnitude-based normalization or user-defined tolerances, the model may misinterpret benign numerical noise as physically meaningful deviations.
In the anomalous regions shown in
Figure 17, the model detected differences between the two grids, but the changes in feature magnitude were extremely small. In this example, the feature under consideration was the
velocity component, and the average difference between the grids in these regions was approximately 0.01%. Despite the small magnitude, the model identified this variation and separated the grids into distinct clusters, resulting in a low similarity score. However, the model did not recognize that, although the distributions differed slightly, the magnitude of the change was so small that it would be considered insignificant by a human observer. This limitation is not a failure of the model itself, since it was never provided with information about which differences should be treated as meaningful. To address this issue in the baseline model, a mechanism was introduced to allow the user to define a feature validity limit. This limit is applied by evaluating each feature using Equation (
24), where
is the validity threshold for the
i-th feature,
is the corresponding feature set, Med is the median, and
j/
k represent the two grids being used in the calculation.
If a feature fails to satisfy the validity criterion in Equation (
24), it is considered converged by user definition and is removed from the feature set provided to the ML model. If none of the features pass the validity test, the model is left with an empty feature set, and the similarity score is manually assigned a value of 1, since all features are deemed to have converged.
Figure 18 and
Figure 19 illustrate the implementation of this feature-validity check using the streamwise velocity for the 2D bump-in-channel case. Pane A in both figures shows the similarity score computed without a feature-validity check, whereas Pane B shows the score after applying the validity criterion. In
Figure 18, the validity check clearly eliminates the spurious low-similarity regions in the far field. When zooming into the bump region in
Figure 19, it becomes evident that the validity check isolates the only physically significant differences between the grids to the vicinity of the bump, which is the expected behavior.
Absolute limits also allow users to introduce custom normalizations that may be better suited to a specific problem. For example, one may normalize the velocity-gradient feature by the local velocity,
(Here,
and
denote the streamwise and normal directions, respectively.). This type of normalization enables the user to define an absolute limit in terms of a physically intuitive quantity—namely, “the fractional change in velocity per unit distance”. Such an interpretation is straightforward to understand and directly relevant to many flow problems.
Figure 20 compares two approaches applied to the
feature: no limit implementation and an absolute limit implementation using the normalized gradient
. The results show that both approaches perform effectively when the absolute limit is paired with a feature that is properly scaled across the domain.
This case study highlights the importance of having users define for the model what they consider a “converged” feature. If users do not specify the level of change they regard as converged, the model tends to be overly sensitive to minor variations in the flow field. These minor changes can significantly affect the similarity scores, leading to artificially low scores in areas that are mostly unaffected by grid refinement. To address this issue, the introduction of feature checks using the parameter has proven to be an effective solution for eliminating non-physical similarity scores. The use of a feature limit was found to be problematic for features that do not scale throughout the domain and can result in excessively high similarity scores. In these cases, the user must provide an intuitive scaling for the feature throughout the domain.
From case study one, it was observed that the choice of
plays a role in the final similarity score. While the model is sensitive to the value of
, the choice of
is intuitive since it is based purely on what the user would deem to be a significant value. When setting a value for
, the user needs only to answer the following question, “If the difference in median values between the two cases was below
, would I consider them significantly different?” By answering this question, the user conveys to the model what a human would consider a convergence value, meaning that while the model is influenced by the value of the parameter, it is not a value that must be tuned through complex search methods. To help the reader better understand the influence of the
parameter on the solution,
Figure 21 shows a sweep of the
parameter. The jet grids J3 and J0 were used to calculate the similarity score for the feature
while sweeping the
from 0.025 to 1. The figure demonstrates that as the parameter is increased, the regions of dissimilarity are reduced with only the strongest regions, those along the jet axis, remaining at
and
.
5.3. Case Study 2: Feature Selection
The second numerical experiment focused on examining feature selection within the model. Since the framework is designed to accept any flow variable as input for computing similarity, an important question arises: which features provide the most meaningful and physically consistent results? In this context, a “better” feature is defined as one whose similarity trends align closely with the convergence behavior observed in standard mesh-confirmation analyses and whose response clearly and intuitively differentiates between grid levels. To investigate this, the axisymmetric jet and bump geometries were used as test cases. Five different feature sets were evaluated using the similarity model with the feature-validity criterion applied. The selected features were the streamwise velocity , the transverse velocity , the velocity gradients and , and the combined gradient set .
We begin the feature-selection analysis with the bump cases. Before examining the similarity results, a brief summary of the mesh-confirmation study is helpful. Analysis of the skin-friction, velocity, and eddy-viscosity profiles in the bump wake showed substantial differences between grids B3 and B2, smaller differences between B2 and B1, and virtually no change between B1 and B0. Based on these findings, comparisons involving grid B3 should yield the highest similarity with B2, with progressively lower and nearly indistinguishable similarity scores when compared with B1 and B0.
Figure 22 and
Figure 23 present the similarity fields for the bump case using the
and
velocity components, respectively. In both cases, the expected trends from the mesh-confirmation study are reproduced, with B3 and B2 showing nearly identical similarity levels. However, using velocity alone as the feature does not highlight the differences between B3 and B2 as strongly as one might anticipate. This behavior can be understood by examining the velocity profiles in
Figure 12, where
reaches its freestream value by
. Consequently, most of the mesh-dependent variation occurs close to the wall, which is not fully captured in the plotted slices. The
feature set identifies differences between B3 and the finer grids primarily around the bump and in the near-wall region of the far wake. In contrast, the
velocity component detects differences mainly from the leading edge of the bump up to its apex, reflecting the physical structure of the flow in that region.
Extending the feature analysis to the gradients,
Figure 24,
Figure 25,
Figure 26 and
Figure 27 show the ML results for the individual velocity gradients
and
, and the combined gradient set
. Starting with the
feature, it can be seen that once again the model does a good job at providing almost identical predictions for grids B1 and B0. The
feature agrees with the velocity features in the respect that it shows very little difference between grids B3 and B2. The
does show some difference between the B3 grid and the B1 and B0 grids in the near wake, but not in the far wake, like the velocity features.
The
feature set, shown in
Figure 25, is the first to reveal a clear distinction between grids B3 and B2. The transverse velocity gradient in the
direction highlights more pronounced differences between B3 and the finer B1 and B0 grids than any of the previously examined features. This behavior is further illustrated in
Figure 26, which displays the
field at the same locations used for the similarity slices. In this figure, one can see that grids B3 and B2 smear the gradients directly over the bump, producing a noticeably different distribution from that of the finer meshes. The coarse grids also generate a small pocket of negative
immediately downstream of the bump, while this feature is absent in the B1 and B0 solutions. In the far wake, B3 and B2 exhibit near-wall gradient structures that differ substantially from those in B1 and B0. Both of these mesh-dependent discrepancies, near the bump and in the downstream region, are well captured by the model’s similarity predictions, demonstrating the sensitivity of the gradient-based feature to physically meaningful changes in the flow.
Finally, the
and
fields were combined into a single feature set and input to the model. The resulting similarity predictions are shown in
Figure 27. This combined-gradient feature set produces the widest range of similarity scores among all the cases examined. It is important to note that the resulting field is not simply a linear superposition of the outputs obtained from the individual
and
features. Instead, the combined input retains the dominant trends seen in both individual feature sets while amplifying regions of dissimilarity, particularly in the B3 to B2 comparison. This suggests that combining complementary gradient information provides a more sensitive indicator of mesh-dependent flow variations.
To continue the feature-selection analysis, the following figures compare grid J3 of the jet case with grids J2, J1, and J0 using each of the feature sets. Before examining these results, it is helpful to summarize the expectations based on the mesh-confirmation study presented earlier. The analysis showed substantial differences in both the centerline and radial velocity profiles between successive grids until J1 and J0, whose profiles nearly overlapped. Accordingly, when evaluating the similarity predictions, we expect grid J2 to appear most similar to J3, while grids J1 and J0 should yield nearly identical similarity fields. The mesh-confirmation study also revealed that differences in the radial streamwise velocity profiles became more pronounced farther downstream from the jet inlet, a trend that should be reflected in the similarity distributions.
Figure 28 and
Figure 29 present slices of the jet domain with similarity scores computed using the
and
velocity components as features, respectively. The most prominent difference between the two feature sets is the ability of
to capture changes along the centerline of the jet, consistent with the mesh-confirmation results showing significant variation in the radial velocity profile. In contrast, the
feature shows almost no difference near the centerline for grid J2 and only mild variations for grids J1 and J0. The
feature does exhibit some limitations. A few anomalously low similarity scores appear just above the jet inlet, a region where no meaningful flow features exist. In addition,
indicates slight differences between grids J1 and J0, whereas
more accurately assigns these grids nearly identical similarity scores when compared to J3, which is the expected behavior.
Continuing to
Figure 30,
Figure 31 and
Figure 32, where the individual velocity gradients
and
, as well as their combined set
, are used as features, it is evident that none of the gradient-based features capture the mesh differences along the centerline as clearly as the
feature. The velocity profiles in
Figure 33 help explain this behavior. From
Figure 33, it can be observed that the refinement of the mesh extends the jets’ influence down the domain, as noted by the “End of Jet” annotations. The refinement also slows the spread of the jet in the radial direction as marked by the “Spread Measure” annotation, which was placed at y = 1.5 and extended until
. When comparing these profiles to the similarity fields generated using
, the model primarily detects changes in the region where the jet spreads outward, approaching the freestream value. The
feature detects the differences along the center line of the jet. Finally, the
feature shows minimal variation between the meshes, except in the immediate vicinity of the jet inlet. The combined-gradient feature set produces a similarity field that emphasizes many of the same regions identified by the individual gradients. Importantly, as the combined result is not a linear superposition of the two single-feature outputs, but it resembles the
prediction while incorporating some of the lowest similarity regions associated with
, these trends suggest that the model assigns greater influence to the more informative gradient component.
To better assess how each feature performs across the full grid hierarchy,
Figure 34 and
Figure 35 present the similarity scores for each sequential grid pair, (J3, J2), (J2, J1), and (J1, J0), in the jet flow case, using the
feature and the combined gradient features
and
. Both feature sets correctly indicate that there is essentially no change in the solution between grids J1 and J0, consistent with the mesh-confirmation analysis. The differences between the feature sets become more apparent when comparing grids J2 and J1. The
feature suggests that the change between J2 and J1 is larger than the change between J3 and J2. This trend is physically reasonable when considering the relative refinement ratios: J2 to J1 has a refinement ratio of 2.6, whereas J3 to J2 has a refinement ratio of 1.7. The larger refinement jump between J2 and J1 naturally allows for more substantial differences to emerge in the computed solution.
When comparing the results from the gradient feature sets and to the feature, we observe that the gradient-based features do not reproduce the same trend in which the change between grids J2 and J1 exceeds that between J3 and J2. Another noteworthy observation is that, for both gradient features, the regions highlighted with low similarity scores remain consistent across all grid pairs. In other words, the areas identified as changing do not significantly shift in shape or location as the mesh is refined. This behavior further supports the reliability of the model, given that the jet meshes were generated using a uniform global refinement ratio applied to all cells. Because no localized refinement was introduced, any meshing deficiencies are addressed uniformly as the grid is refined. As a result, one would expect the regions of physical change in the solution to remain in the same locations while their magnitude evolves with each refinement step—a trend that is reflected in the similarity predictions.
5.4. Summary Observations: Feature-Set Sensitivity and Comparative Performance
From the analysis of both cases, no single feature set emerged as the clear best choice. Each feature set produced results that were logical and consistent with the physics. The velocity was the least informative. It showed very little difference between grids in either the jet or bump cases. The velocity performed better. In the bump case, it highlighted changes near the bump and in the near-wall wake region. In the jet case, captured large differences along the centerline, which agrees with the mesh-confirmation findings.
The gradient features were the most sensitive. For the bump case, they produced the lowest similarity scores and exposed questionable gradient behavior in the coarse meshes. For the jet case, the gradient dominated, while showed almost no meaningful distinction between grids. It is also important to note that combining the two gradient features did not create a linear combination of their individual outputs. Instead, the combined set produced a new pattern influenced by both, but weighted more strongly toward .
Overall, there was no clear best feature set for the jet case. Both velocity and gradient features had strengths and weaknesses. The model performred well with either a single feature or multiple features. There is also potential value in using other flow variables, such as pressure or turbulence quantities, but additional work is needed to confirm their effectiveness. Second-order gradients were also tested, but they did not offer noticeable improvements over the first-order features, and hence, for brevity, were not included in this paper.
Based on these results, the authors recommend following the “No Free Lunch” theorem of machine learning [
38]. This principle states that no single model, or in this context, no single feature set, works best for all problems. Each application requires testing and selecting the feature set that performs best for that specific case. For practical use of this model, two suggestions are made. First, we suggest starting with both velocity (
,
) and gradient feature sets (
,
), then placing greater emphasis on whichever feature set aligns most closely with the needs and physics of the user’s problem. Second, the user is advised to not “boil the ocean” with feature selection. If too many features are added to the model, its prediction ability and interpretability will be impaired. This is due to the “curse of dimensionality” [
39], which describes how finding structures becomes more difficult when the dimensionality of the data is higher. For clustering approaches, this is caused by the increase in distance between points making the data to become more sparse, making the cluster fitting process less effective at extracting meaningful groupings. In addition to reducing the model’s effectiveness, it also hinders the user’s ability to interpret the decision-making of the model, as the user cannot as easily link the similarity score values to any particular input feature. Currently, the authors advise keeping the number of features below 5, as this was the maximum tested during development.