3.1. Single Subtle Parameter Identification
In order to explore the feasibility of deep learning methods for identifying micro-crack groups and lay the groundwork for the next work, we first conducted a test using single parameter identification. For single identification of the quantity parameter or the size parameter, we used one-dimensional convolutional neural networks (1D CNNs), which have achieved significant achievements in feature extraction and recognition of one-dimensional time series signals [
35,
36]. We utilized single harmonic nonlinear responses extracted from the FEM model set as the sample set and established a 1D CNN model for identifying single subtle parameters of micro-crack groups. Each sample, with a shape of (1, 4760), was input into the network through a single channel. As illustrated in
Figure 5, the input sample was either
or
, depending on whether the second harmonic or the third harmonic from the FEM model set was selected.
For brevity, in this paper, the annotations following all convolutional layers indicate four hyperparameters: the number of input channels, the number of output channels, the convolutional kernel sizes, and the strides, respectively. For the max-pooling layers, they indicate the pooling kernel sizes and the strides. To enhance the neural network’s nonlinear capabilities, we applied a Rectified Linear Unit (ReLU) activation function after each convolutional layer. As shown in
Figure 5, after initial optimization, the hyperparameters of each layer in the 1D CNN architecture along the direction of the arrows were set to Conv1d (1, 12, 3, 1), Maxpool1d (4, 2), Conv1d (12, 24, 3, 1), and Maxpool1d (4, 2).
The flattened output of the final max-pooling layer was fed into a fully connected layer (FC). The FC input shape is 7104, with the output shape varying based on the specific identification task (three neurons for quantity parameter identification and four neurons for size parameter identification). Then, the FC output was processed by a softmax layer to obtain probability distributions summing to 1. Finally, an argmax function then assigned the label with the highest probability as the identification label for each testing sample, which was compared with the true label to calculate the identification accuracy. What we have to be aware of is that the normal distribution standardization was used in the weight initialization, including all convolutional kernel weights and FC neuron weights. Additionally, all the initial biases were set to 1.
A mean square error (MSE) loss function and an Adam optimizer were used in the training process. To reduce 1D CNN’s iteration time and overfitting risks, the learning rate gradually decreased from 0.003 to 0.0001 over 20 steps. All neural networks were implemented using PyTorch (Python 2.7) on an GeForce RTX 4060 Laptop GPU (NVIDIA, Santa Clara, CA, USA).
It took 6.14 s to train the 1D CNN for 600 epochs to identify the quantity parameter. The identification accuracy that stabilized with increasing iterations was recorded. To minimize errors, we randomly re-split the sample set and updated the initial weights of the neural networks, repeating this process 100 times. The mean identification accuracy is shown in
Figure 6. Under all conditions, the identification accuracies for the quantity parameter by the second harmonic are greater than 99%, demonstrating a higher sensitivity than the third harmonic. Therefore, the 1D CNN driven by the second harmonic can be considered an effective method for identifying the quantity parameter of micro-crack groups.
It took 8.16 s to train the 1D CNN for 600 epochs to identify the size parameter. As shown in
Figure 7, the second harmonic exhibits higher sensitivity to the size parameter across all conditions. The highest identification accuracy of 94.42% was achieved by the second harmonic received by sensor
when the number of micro-cracks ranged from 210 to 300. However, size parameter identification performance is generally inferior to quantity parameter identification for all nonlinear responses.
The results of the test confirm that 1D CNN has a certain ability to identify subtle parameters of micro-crack groups and also demonstrate the significant advantages of the second harmonic in identifying the quantity and size parameters. However, when facing the problem of multi-parameter decoupling identification, it is uncertain whether the second harmonic can maintain a high identification accuracy. Therefore, the focus of our next work will be on dual-parameter decoupling identification.
3.3. Multi-Harmonic Fusion
To enhance the decoupled identification accuracy of multiple subtle parameters, we considered the specific mapping relationships between micro-crack groups and different nonlinear responses. Relying on a single nonlinear response is insufficient to comprehensively observe the subtle parameter information of micro-crack groups. Therefore, we innovatively combined independent nonlinear responses. Information fusion can effectively utilize multiple information sources to accurately identify targets. Based on this, we propose a series of new frameworks for the decoupled identification of multiple subtle parameters. For sensors , , and c, the second and third harmonics were fused at the data level, feature level, and decision level, respectively.
Data-level fusion can make full use of raw data and overcome information loss [
37,
38]. As shown in
Figure 9a, we propose a novel method of concatenating the second and third harmonics received by a single sensor. The fused second–third harmonic served as the sample set, where each sample has a shape of (2, 4760). Therefore, we established a 2D CNN and input the new samples through a single channel. The hyperparameters were set to Conv2d (1, 36, 2, 1), Maxpool2d (6), and FC (4758, 12). The annotations following all FCs indicate two hyperparameters: the input shape and the output shape. Valid padding was used in each convolutional layer to control the filter movement. In addition, other hyperparameters of the 2D CNN are similar to the previous 1D CNN. It took 32.94 s to train the 2D CNN for 600 epochs. The decoupled identification accuracies obtained over 100 training sessions, using the same method as before, are shown in
Figure 8, with sensors
,
, and
achieving 92.32%, 88.84%, and 89.68%, respectively.
We also propose another data-level fusion method, as shown in
Figure 9b. In this method, the second and third harmonics were fed into the 1D CNN through two separate channels. The hyperparameters were set to Conv1d (2, 12, 3, 1), Maxpool1d (4), Conv1d (12, 24, 3, 1), Maxpool1d (4), and FC (7104, 12). It took 24.92 s to train the 1D CNN for 600 epochs. The decoupled identification accuracies obtained over 100 training sessions, using the same method as before, are shown in
Figure 8, with sensors
,
, and
achieving 92.96%, 91.74%, and 91.08%, respectively.
Different from directly fusing the raw data, feature-level fusion perceives and makes decisions by fusing deep features extracted from the raw data [
39]. As shown in
Figure 9c, the second and third harmonics were separately fed into two independent and identical 1D CNNs for feature extraction. The 1D CNNs used here are similar to the one shown in
Figure 5, with the network architecture hyperparameters set to Conv1d (1, 12, 3, 1), Maxpool1d (4, 2), Conv1d (12, 24, 3, 1), Maxpool1d (4, 2), and FC (7104, 12). The outputs of the two FCs were algebraically summed to fuse features from different nonlinear responses. It took 43.32 s to train the two 1D CNNs for 600 epochs. The decoupled identification accuracies were obtained over 100 training sessions using the same method as before and are shown in
Figure 8, with sensors
,
, and
achieving 92.31%, 91.58%, and 90.82%, respectively. Thus, it can be observed that neither data-level nor feature-level fusion has demonstrated a significant advantage. The specific reasons for this are not explored in depth in this paper.
Decision-level fusion is a high-level fusion method that synthesizes the independent judgment results from various information sources [
40]. As shown in
Figure 9d, the second and third harmonics were separately fed into two independent and identical 1D CNNs. The hyperparameters of each 1D CNN architecture were still set to Conv1d (1, 12, 3, 1), Maxpool1d (4, 2), Conv1d (12, 24, 3, 1), Maxpool1d (4, 2), and FC (7104, 12). The outputs of FCs were converted into basic probability assignments (BPAs) using two identical softmax layers. Next, we needed to fully utilize these BPAs. The DS evidence theory has significant advantages in dealing with uncertain and imprecise information [
41]. We propose a decision fusion method for multi-harmonic nonlinear ultrasonic responses by utilizing DS evidence theory to fuse the BPAs provided by the second and third harmonics. For convenience, we denoted the second harmonic and the third harmonic as nonlinear response
and response
, respectively. We established a frame of discernment for storing various subtle parameter labels of micro-crack groups. From the previous part, it is easily understood that the quantity and size of the micro-cracks in any FEM model established in this paper are uniquely determined. Therefore, each sample corresponds to a unique determined label. We denoted the 12 mutually exclusive subtle parameter labels in
Figure 4 as
, representing AD, AE, …, CG, respectively. They formed the frame of discernment
under sensor
that is shown in (2).
is the power set consisting of elements from
. The function
was defined as a mapping
→ [0, 1].
represents the BPA value provided by the nonlinear response
under sensor
for
, satisfying (3)
. A high BPA value indicates a high degree of support.
We first used the classic Dempster combination rule, as shown in (4), where
is the conflict coefficient between the second and third harmonics under the sensor
as shown in (5). The BPA value
provided by sensor
for label
is jointly determined by the second and third harmonics. The label with the highest BPA value was taken as the identification label. It took 33.02 s to train the two 1D CNNs for 600 epochs. As shown in
Figure 8, the mean identification accuracies over 100 training sessions for sensors
,
, and
are 93.22%, 92.06%, and 91.65%, respectively. Surprisingly, compared to the simple recognition framework driven by the second harmonic alone, the decision fusion results based on the classic DS evidence theory do not perform well on any sensor.
3.4. Improvement of Multi-Harmonic Fusion
To find out the cause, we analyzed the fusion process of numerous testing samples. We found that the decision framework indiscriminately fused all the information provided by the two nonlinear responses. However, in the previous part, the identification performance of the third harmonic is inferior to that of the second harmonic. This means that the third harmonic, which carries more unreliable information, was given an equal opportunity to participate in the fusion process, severely misleading the decision-making process. To better understand this phenomenon, we illustrated the fusion process by taking a testing sample
as an example, whose true label is
. The labels
,
, and
satisfy:
What we have to be aware of is that we ignored the eight labels from
to
with extremely low BPA values, and approximately considered the sum of the BPA values of
to
as 1. Based on the classic DS evidence theory shown in (4) and (5), the fusion result is:
This adopts the suggestion of the third harmonic, leading to the misclassification of the test sample. Therefore, it is necessary to assign different weights to the two nonlinear responses. Under the condition of having previous experience, the independent performances of information sources under the same condition are usually used as a reference standard [
18].
Based on the independent performances of the two nonlinear responses on the same 1D CNN from the previous part, as shown in (6), we denoted the mean identification accuracy of the nonlinear response
under sensor as
, which determines the credibility
of this nonlinear response. Clearly, a nonlinear response with a higher mean identification accuracy should have a higher credibility and can be assigned a higher weight
, as shown in (7) (
).
After determining the weight of each nonlinear response, many researchers [
19,
42] calculated the WAE according to (8) and utilized the classical Dempster combination rule to fuse multiple WAEs. However, due to the significant difference in weight distribution among the nonlinear responses in this paper, the weighted average method is no longer suitable for the fusion of two nonlinear responses. Taking the testing sample
as an example, its true label is
. Under sensor
, the two nonlinear responses
and
have weights:
Then, the unnormalized weighted average evidence is calculated as:
Obviously, the WAE is very similar to the BPA provided by the second harmonic, which almost completely ignores all the information contained in the third harmonic, regardless of whether it is beneficial or erroneous. Furthermore, the BPA values provided by the nonlinear responses in this paper are usually extreme due to the presence of the softmax layer. Therefore, regardless of the extent to which the nonlinear response
in test sample
negates label
,
→ 0 is unavoidable, which makes the value of
difficult to intervene in the fusion result effectively. In other words, the influences of extreme probability values are mitigated by the weighted average process. Therefore, we believe that using the weighted average method to preprocess the BPAs provided by them is imprudent.
To address the aforementioned issue, we transferred the weights of the nonlinear responses to all labels in , and a simple method for determining the label weight was proposed as follows: First, we defined all that satisfy to form the set , which contains all labels with the nonhighest support from the nonlinear response under sensor . When = , = , and when = , = . The higher the weight of the nonlinear response under sensor , the lower the weight that should be assigned to the labels it does not support. Thus, these unsupported labels should be given a weight of . The initial weights of the labels in decrease as the number of nonsupporters increases. Accordingly, all labels in were divided into four categories: those supported by no nonlinear response, those supported only by nonlinear response , those supported only by nonlinear response , and those supported by both nonlinear responses. These categories correspond to the four conditions for calculating the initial weights as per (9). What we have to be aware of is that only the labels in the set were penalized in terms of weight. For the labels supported by the nonlinear response, we set their initial weights to 1 by default. All labels in received a final weight that sums to 1 after normalization according to (10). These final weights were added into (4) to improve the classical Dempster combination rule, resulting in (11). The calculated result using the improved rule was normalized to obtain the BPA value provided by sensor according to (12).
To prove the advantages of the improved rule, we take the testing sample
as an example, where its true label is
. Under sensor
, the two nonlinear responses
and
have weights:
According to (9) and (10), the final weights of the labels are calculated as:
It is easy to observe that the difference between the BPA values provided by sensor
for labels
and
significantly decreased. This indicates that, although the third harmonic is assigned a very low weight, its strong negation of label
is still given consideration in the decision process rather than being ignored. There are numerous instances that can support this point. We list another testing sample
to illustrate how to effectively utilize beneficial information from the third harmonic. The true label of
is
, and the labels
,
, and
satisfy:
The final weights of the labels are calculated as:
The normalized fusion results are:
Obviously, the third harmonic rectifies the erroneous identification made by the second harmonic.
In summary, our improvement method proposed in this paper enhances the rationality of the decision fusion process and improves the accuracy of the decision results. Therefore, we re-fused the BPAs provided by the two nonlinear responses. The label with the highest fused BPA value was taken as the identification label. As shown in
Figure 8, the mean identification accuracies over 100 training sessions for sensors
,
, and
are 93.73%, 93.13%, and 92.24%, respectively, which represent improvements of 0.51%, 0.93%, and 0.59%, respectively, compared to the classic Dempster combination rule. This proves that our improvement method proposed in this paper reasonably uses the beneficial information from different nonlinear responses, thereby enhancing the identification performance of each sensor.
3.5. Multi-Sensor Fusion
The decision-level fusion of multi-harmonic nonlinear response can effectively improve the identification accuracy of each sensor. However, the relative position of each ultrasonic sensor to the micro-crack groups is fixed, resulting in the sensor receiving an ultrasonic signal with insufficient information about the micro-crack groups. In contrast, multi-sensor data fusion technologies can successfully overcome the limitations of single sensors, integrating the crack information carried by all sensors, thereby enhancing the reliability and accuracy of the identification results [
43]. Based on the
,
, and
, which are obtained by the decision-level fusion of multi-harmonic nonlinear responses under sensors
,
, and
, a sensor-level information fusion framework was established in this paper. Given that decision-level fusion performs best in the previous section, we continued to employ DS evidence theory to integrate the effective information from the three sensors, achieving the decision-level fusion of multiple ultrasonic sensors, as shown in
Figure 10. Since the identification targets are still the 12 subtle parameter labels, the frame of discernment
is shown in (13).
is the power set consisting of elements from
. From the previous part, it is easy to know that the BPA value
must satisfy (14).
For the decision-level fusion of multiple ultrasonic sensors, we used the classic DS combination rule, as shown in (15), where
is the conflict coefficient between the two sensors, as shown in (16). The BPA value
for label
is jointly determined by the three sensors. The label with the highest BPA value was taken as the identification label. It took 98.75 s to train all the 1D CNNs for 600 epochs. As shown in
Figure 11a, the mean identification accuracy over 100 training sessions is 94.07%, which is only a 0.34% improvement over sensor
.
3.6. Improvement of Multi-Sensor Fusion
To find out the cause, we analyzed the fusion process of numerous testing samples and found that there is usually a certain degree of conflict between the BPAs provided by each sensor. The sensor with a higher conflict coefficient than others exhibits lower reliability and is more likely to provide erroneous information, which can strongly interfere with the identification results of the other sensors and increase the risks of decision fusion, even leading to counterintuitive decision results [
44,
45]. The classical DS evidence theory does not mitigate the negative impact caused by unreliable sensors, which greatly limits improvement in identification accuracy. Therefore, it is necessary to assign weights to all sensors based on their conflicts [
46]. How to measure conflicts and calculate the sensor weights may be a difficult task. In addition to the conflict coefficient
in the classical DS evidence theory, many scholars [
46,
47,
48,
49] have proposed numerous conflict measurement methods, such as similarity degree or evidence distance. However, these methods also exhibit certain limitations when dealing with extreme probability values. Taking the testing sample
as an example, whose true label is
, for sensors
,
, and
, the labels
,
, and
satisfy:
It is easy to observe that the conflict between sensors
and
is significantly greater than that between sensors
and
. However, the calculated conflict coefficient between sensors
and
is
≈ 0.9676, and between sensors
and
is
≈ 0.9724, indicating that the conflict coefficient
does not exhibit high sensitivity to conflict. Additionally, according to the evidence distance measurement method proposed by Jousselme [
48], the calculated evidence distance between sensors
and
is
≈ 0.7614, and between sensors
and
is
= 0.7681. In fact, when
→ 0,
→ 0, and
→ 1, the distance
approaches a constant. Therefore, it is challenging to use Jousselme’s evidence distance to significantly represent the difference in the aforementioned conflicts.
To address the aforementioned issue, a simple method for evaluating sensor conflicts was proposed as follows: We defined
that satisfies
=
as the label most supported by sensor
, denoted as
. For simplicity, we only considered the labels corresponding to the highest BPA values of at least one sensor. Using sensors
and
as an example: Firstly, when
=
, it is evident that the conflict between sensors
and
increases as the product of
and
decreases. Simultaneously, the conflict also increases as the product of
and
decreases. Therefore, if the geometric mean of these two products decreases, we consider the conflict between sensors
and
to have increased and their weights to have decreased, which means that the weight of sensor
increases. Secondly, when
≠
=
, we only consider
. Similarly, the conflict between sensors
and
increases as the difference between
and
increases. In other words, if the ratio of
to
increases, we consider the conflict between sensors
and
to have increased and their weights to have decreased, which means that the weight of sensor
increases. Accordingly, we defined the conflict function
as shown in (17).
was used to measure the conflict between sensors
and
. Clearly, the weight of sensor
increases as the value of
increases. Similarly,
was used to measure the conflict between sensors
and
, and
was used to measure the conflict between sensors
and
. To prove the rationality of the conflict function
, we use the example
, where the calculation results are
≈ 7.974 and
≈ 114.9, showing that the conflict between sensors
and
is significantly greater than that between sensors
and
, which aligns better with our intuition.
The purpose of measuring conflicts is to determine the weights of the sensors, as shown in (18)–(22). We propose a method of normalizing the conflict functions to assign the sensor weights
,
, and
. Although each sensor’s weight may vary greatly, this variation largely stems from the extreme probability values in the BPAs provided by sensors. Unlike the previous part, there are only two nonlinear responses in the decision fusion process. Discussing the concept of conflict in the context of information fusion with only two evidence bodies is meaningless. However, for multi-sensor decision fusion, the presence of extreme probability values usually causes a significant conflict between a sensor and the others, thereby reducing its weight. Based on the comprehensive analysis, we believe that weighting the BPA provided by each sensor to obtain the WAE is feasible. Finally, according to (23) and (24), by applying the classical Dempster combination rule, the WAEs were merged for
times (
= 5), which was applied to the decoupled identification of subtle parameters of micro-crack groups. As shown in
Figure 11a, the mean identification accuracy over 100 training sessions is 95.68%, which is 1.61% higher than before the improvement. This proves that our proposed improvement method in this paper mitigates the negative impact of conflicts to some extent and enhances the identification performance of the multi-sensor decision fusion method.
Figure 11b shows a confusion matrix of any one identification result with an identification accuracy of 95.8%.
The simulation results presented in this paper preliminarily demonstrate the excellent performance of the proposed multi-harmonic and multi-sensor fusion methods. Undoubtedly, environmental noise, as well as the nonlinearities in material and equipment, may lead to suboptimal ultrasonic signals, which could, to some extent, affect the initial identification accuracy prior to fusion. However, we can reasonably speculate on the contribution of these methods in actual detection. On the one hand, regardless of the quality of the actual ultrasonic signals or the initial BPA, the sensitivity-weighted decision fusion method can effectively integrate multi-harmonic BPAs. On the other hand, the limited number of actual samples can lead to extreme BPA; however, the conflict-weighted multi-sensor fusion method can still enhance the rationality of the decisions. Therefore, this paper theoretically presents a feasible fusion-based recognition method, and future work will focus on validating the effectiveness of these approaches in practical detection.