Ultrasonic Sensor Signals and Optimum Path Forest Classifier for the Microstructural Characterization of Thermally-Aged Inconel 625 Alloy

Secondary phases, such as laves and carbides, are formed during the final solidification stages of nickel-based superalloy coatings deposited during the gas tungsten arc welding cold wire process. However, when aged at high temperatures, other phases can precipitate in the microstructure, like the γ” and δ phases. This work presents an evaluation of the powerful optimum path forest (OPF) classifier configured with six distance functions to classify background echo and backscattered ultrasonic signals from samples of the inconel 625 superalloy thermally aged at 650 and 950 °C for 10, 100 and 200 h. The background echo and backscattered ultrasonic signals were acquired using transducers with frequencies of 4 and 5 MHz. The potentiality of ultrasonic sensor signals combined with the OPF to characterize the microstructures of an inconel 625 thermally aged and in the as-welded condition were confirmed by the results. The experimental results revealed that the OPF classifier is sufficiently fast (classification total time of 0.316 ms) and accurate (accuracy of 88.75% and harmonic mean of 89.52) for the application proposed.


Introduction
Nb-bearing nickel-based superalloys, in particular inconel 625, has greater applicability, especially in highly corrosive environments, such as the ones in the oil and gas industry, than many other nickel (Ni)-based alloys. Nowadays, this alloy is used widely in the weld overlay of the inner surface of carbon steel pipes and other equipment of the offshore industry. However, further studies about this alloy, such as the one presented in this paper, are necessary to increase the overall knowledge of its properties.
During the welding of an inconel 625 alloy, there is an intensive microsegregation of some elements, such as niobium (Nb) and molybdenum (Mo), within the interdendritic regions, causing the supersaturation of the liquid metal with these chemical elements in its final stage of solidification, which results in the precipitation of the Nb-rich laves phase and MCprimary carbides of type NbC [1,2]. This segregation and precipitation of the secondary phases can change the mechanical properties of the alloy and decrease its resistance to corrosion [3]. In addition, the Nb-rich laves phase has a low melting point that causes an increase in the temperature solidification range, making the alloy susceptible to solidification cracking [4].
Nondestructive testing based on ultrasonic signals has been commonly used to study this kind of material. For example, in the evaluation of the embrittlement kinetics and elastic constants of the SAF2205 duplex stainless steel for different aging times at 425 and 475 • C [5], spinodal decomposition mechanism study on the UNSS31803 duplex stainless steel [6], evaluation of grain refiners' influence on the mechanical properties in a CuAlBe shape memory alloy [7], sigma phase detection on a UNS S31803 duplex stainless steel [8], characterization of welding defects [9], characterization of cast iron microstructure [10], pattern classification in nondestructive materials inspection [11], nondestructive characterization of microstructures and determination of elastic properties in plain carbon steel [12] and in the phase transformations evaluation on a UNS S31803 duplex stainless steel based on nondestructive testing [13].
In this sense, the main goal of this work was to evaluate the influence of six distance functions, mainly the Euclidean, chi-square, Manhattan, Canberra, squared chi-squared and Bray-Curtis distances, in the performance of the recent and powerful optimum path forest classifier to detect/identify, based on ultrasonic signals, the kinetics of the phase transformation of a Ni-based alloy thermally aged at 650 and 950 • C for 10, 100 and 200 h, as well as in the as-welded state. Raw data ultrasonic background echo and backscattered signals acquired with two types of transducers (4 and 5 MHz) were used. For a further assessment of the distance functions' performance, the results obtained were very satisfactory in terms of accuracy rate, train and test times, confusion matrix and harmonic mean between specificity and sensitivity, which makes the results presented and discussed of noteworthy value.
The OPF has been evaluated in different applications as, for example, EEG signal classification for epilepsy diagnosis [14], ECG arrhythmia classification [15], automatic characterization of graphite particles in metallographic images [16], intrusion detection in computer networks [17], aquatic weed automatic classification [18] and spoken emotion recognition [19], among others.

Materials and Methods
This section describes the experimental work done for the temperatures of 650 and 950 • C for 10, 100 and 200 h, as well as for the as-welded state. First, the ultrasonic sensor signals acquired and the related fundamentals are introduced. Afterwards, the optimum path forest classifier used to classify the ultrasonic signals is presented. Finally, the metrics used in the classifier evaluation are described.

Ultrasonic Sensor Signals
After the welding and preparation of the samples, described in detail in [20,21], the background echo and backscattered signals were acquired to evaluate the effect of aging on the inconel 625 alloy samples.
The pulse echo technique and the direct contact method were used to collect the background echo and backscattered ultrasonic signals [8]. All of the signals were obtained using commercial nondestructive testing (NDT) ultrasonic transducers: one of 4 MHz (Krautkramer, Model MB4S, Lewistown, PA, USA) and another one of 5 MHz (Krautkramer, Germain, Model MSW-QCG). The choice of these transducers was based on the authors previous experience in this kind of NDT and knowledge concerning the material under study [22][23][24][25]. In fact, Albuquerque et al., in [21], showed that these frequencies revealed were to be the most adequate to analyze the material under study, as a transducer with a frequency of 10 MHz completely attenuated the ultrasonic signal; and one with a frequency of 2.25 MHz led to an adjacent echo that overlapped the signal extensively, seriously compromising the accuracy of the results.
As a coupling material, the SAE 15W40 lube oil was used for the longitudinal measurements. A Krautkramer ultrasonic device (GE Inspection Technologies, Lewistown, PA, USA, model USD15B) was used connected to a 100-MHz digital oscilloscope (Tektronix, Portland, OR, USA, model TDS3012B), which transmitted the ultrasonic signals to a computer for processing, according to a sampling rate of 1 GS/s. The microstructural characterization was carried out using the OPF classifier configured with the Euclidean, chi-square, Manhattan, Canberra, squared chi-squared and Bray-Curtis distances on the original background echo and backscattered signals. In order to assure statistical significance, 40 signals were acquired for each sample, and each background echo signal had 10,000 points; i.e., a total of 400,000 points was attained, and each backscattered signal had 500 points, resulting in a total of 20,000 points for this study.
Albuquerque et al., in [21], did not consider echo signals without preprocessing, claiming that the large number of points made their use impracticable. However, this problem has been overcome, because the classifier used here is faster and more powerful, which is one of the important contributions attained with this work. Nunes et al. [20] compared the OPF, configured only with the Euclidean distance, with the support vector machine and Bayesian classifiers and showed its superiority in terms of the processing time and accuracy rate. Thus, another contribution of this work was to analyze the influence of six distance functions on the OPF's performance to detect/identify microstructural changes from the ultrasonic signals due to aging.

Optimum Path Forest Classifier
The OPF classifier models the problem of pattern recognition as a graph partition in a given feature space. The nodes are represented by the ultrasonic signal feature vectors, and all pairs are connect by edges, defining a complete graph. This kind of representation is straightforward, given that the graph does not need to be explicitly represented, and has low memory requirements. The partition of the graph is carried out by a competition process between some key samples, known as prototypes, which offer optimum paths to the remaining nodes of the graph. Each prototype sample defines its optimum path tree (OPT), and the collection of all OPTs defines the optimum path forest, which gives the name to the classifier [26].
The OPF can be seen as a generalization of the well-known Dijkstra algorithm to compute optimum paths from a source node to the remaining ones [27]. The main difference relies on the fact that OPF uses a set of source nodes, i.e., the prototypes, with any path-cost function. In the case of Dijkstra's algorithm, a function that summed the arc-weights along a path was applied. For OPF, a function that gives the maximum arc-weight along a path is used [26].
Let Z = Z 1 ∪ Z 2 be a dataset labeled with a function λ, in which Z 1 and Z 2 are, respectively, training and test sets, and let S ⊆ Z 1 be a set of prototype patterns (ultrasonic signal feature vectors). Essentially, the OPF classifier builds a discrete optimal partition of the feature space, such that any sample s ∈ Z 2 can be classified according to this partition. This partition is an optimum path forest (OPF) computed in n by the image foresting transform (IFT) algorithm [28].
The OPF algorithm may be used with any smooth path-cost function that can group ultrasonic signal features with similar properties [28]. This work used the path-cost function f max , which is computed as: in which d(s, t) means the distance between ultrasonic signal features s and t, and a path π is defined as a sequence of adjacent features. As such, f max (π) computes the maximum distance between adjacent samples in π, when π is not a trivial path. The OPF algorithm assigns one optimum path P * (s) from S to every ultrasonic signal feature s ∈ Z 1 , originating an optimum path forest P (a function with no cycles, which assigns to each s ∈ Z 1 \S its predecessor P (s) in P * (s) or a marker nil when s ∈ S. Let R(s) ∈ S be the root of P * (s) that can be reached from P (s). The OPF algorithm computes for each s ∈ Z 1 the cost C(s) of P * (s), the label L(s) = λ(R(s)) and the predecessor P (s).
The OPF classifier is composed of two distinct phases: (I) training; and (II) classification. The former step consists, essentially, of finding the prototypes and computing the optimum path forest, which is the union of all OPTs rooted at each prototype. After that, a sample is picked from the test sample, which connects it to all of the samples of the optimum path forest generated in the training phase. Notice that this test sample is not permanently added to the training set, i.e., it is performed only once. The next sections describe this procedure in more detail.

Training
We say that S * is an optimum set of prototypes when the OPF algorithm minimizes the classification errors for every s ∈ Z 1 . S * can be found by exploiting the theoretical relation between the minimum spanning tree (MST) and optimum path tree for f max [29]. The training essentially consists of finding S * and an OPF classifier rooted at S * .
By computing an MST in the complete graph (Z 1 , A), we obtain a connected acyclic graph whose nodes are all ultrasonic signal features of Z 1 , and the arcs are undirected and weighted by the distances d between adjacent features. The optimum spanning tree is the tree that has the least sum of its arc compared to any other spanning tree in the complete graph. In the MST, every pair of ultrasonic signal features is connected by a single path that is optimum according to f max . That is, the minimum spanning tree contains one optimum path tree for any selected root node.
The optimum prototypes are the closest elements of the MST with different labels in Z 1 ; i.e., elements that fall in the frontier of the classes. By removing the arcs between different classes, their adjacent features become prototypes in S * , and OPF can compute an optimum path forest with minimum classification errors in Z 1 . It should be noted that a given class may be represented by multiple prototypes, i.e., optimum path trees, and there must exist at least one prototype per class.

Classification
For any ultrasonic signal feature t ∈ Z 2 , all arcs connecting t with samples s ∈ Z 1 are addressed, as though t were part of the training graph. Considering all possible paths from S * to t, the optimum path P * (t) from S * is found, and t is labeled with the class λ(R(t)) of its most strongly connected prototype R(t) ∈ S * . This path can be identified incrementally by evaluating the optimum cost C(t) as: Let the node s * ∈ Z 1 be the one that satisfies Equation (2), i.e., the predecessor P (t) in the optimum path P * (t). Given that L(s * ) = λ(R(t)), the classification simply assigns L(s * ) as the class of t. An error occurs when L(s * ) = λ(t).

Performance Evaluation Metrics
In order to analyze the performance of the machine learning technique used, three metrics were employed: accuracy, sensitivity and specificity.
Accuracy (Acc) is defined as the ratio of the total number of samples correctly classified and the number of total samples, Accuracy = number of correctly classif ied samples number total of samples (3) Sensitivity (Se) can be defined as the ratio of the total number of samples correctly classified of one class and the total number of samples classified as belong to that class, including the number of the missed classified samples, Sensitivity = true positives true positives + f alse negatives (4) in which true positives and false negatives stand for the number of samples of a given class correctly and incorrectly classified, respectively. Specificity (Sp) stands for the ratio of the total number of samples correctly classified and the number of all samples classified as belonging to a specific class, Specif icity = true negatives true negatives + f alse positives (5) in which true negatives stands for the number of samples not belonging to a given class classified as not belonging to the considered class, while false positives stands for the number of samples incorrectly classified as belonging to a given class. Observe that these last two measures are based on the data of each class. Furthermore, we also propose the use of a harmonic average between sensitivity and specificity, that is the harmonic mean (HM ): These evaluation metrics can be computed from a confusion matrix, which can be obtained by comparing the expected classification (reference data) with the ones predicted by the classifier. Besides these measures for evaluating and comparing the effectiveness performance of the classifier used, we also compute the training and testing times.

Results and Discussion
The original ultrasonic background echo and backscattered signals, acquired using 4-and 5-MHz transducers, were classified using the OPF classifier configured with the Euclidean, chi-square, Manhattan, Canberra, squared chi-squared and Bray-Curtis distances. The classification efficiency (processing time) and efficacy (accuracy rate, confusion matrices and harmonic mean) were analyzed. Thus, it is possible to evaluate the performance of the classifier to identify the microstructural classes. The original signals, for all distance metrics, were partitioned using the holdout method (50% for training and 50% for testing). The standard deviation for the mean accuracy, harmonic mean and processing time load over 10 iterations generated randomly were computed. The execution was performed on a personal computer with an Intel Core i3, at 3 GHz and with 3 GB of RAM using Linux Ubuntu as the operational system.

Efficiency and Effectiveness Analysis
The performance of each distance used by the OPF classifier was assessed through the accuracy rate, harmonic mean between specificity and sensitivity, processing time considering the training and testing phases and, finally, the confusion matrix. Figure Table 1, which shows the accuracy rate and harmonic mean for the 4-and 5-MHz frequency signals, the highest accuracy was achieved using the Manhattan distance (88.75%), highlighted in bold in the table. This value is around 3.75% higher than the second best result that was attained by the Euclidean distance metric (85%). In Table 2 is shown the confusion matrix of the classification means for the 4-MHz frequency backscattered signals; one can observe the difference between the Euclidean and Manhattan distances' performance compared to the other distances, since these two distances properly classified an average of 16-18 samples per class, while the other ones confused a large part of the samples, the majority being classified as belonging to the 200-h class.
In Table 3 is presented the confusion matrix of the classification means for the 4-MHz frequency pulse echo signals; one can conclude that the classification was more distributed across the classes, since 7-14 of the samples were classified correctly with the others distributed across the other classes.  In Table 4 is presented the confusion matrix for the 5-MHz frequency backscattered signals. In this case, the performance of the Euclidean and Manhattan distances was around 25% below the one for the frequency backscattered signals, whereas the Euclidean distance has an accuracy between eight and 15 samples of each class, and the Manhattan distance had an accuracy from 10-14. The other distances classified correctly, on average, less than 10 samples per class.  Table 5 shows the confusion matrix for the 5-MHz frequency pulse echo signals. Here, the Manhattan distance achieved a mean accuracy of 11-16 samples, reaching in the best case 70% accuracy and in the worst cases 58.75%, whereas the Euclidean distance achieved a mean accuracy between 10 and 17, reaching a maximum of 75% and a minimum of 61.25%. Notable was the squared chi-squared distance that in its best result achieved an accuracy rate of 63.75%; the other distances achieved a mean accuracy from 1-15.
In Table 6, the training and testing average times for the 4-and 5-MHz frequency signals are presented in milliseconds. All distances achieved analogous results regarding training and testing times for the backscattered signals, keeping a time of 0.2 ms for training and of 0.1 ms for testing. For the pulse echo signals, average times range from 0.5-1 milliseconds for training and 0.2-0.3 for testing.  In order to ascertain the best classification performance of the samples aged at 950 • C, we analyzed in detail the classification of each round. As shown in Table 7, which indicates the accuracy rates and harmonic mean for the 4-and 5-MHz frequency signals, the Manhattan distance achieved 60% accuracy for the 4-MHz pulse echo and backscattered signals and 60.63% for the 5-MHz pulse echo signals. However, the Euclidean distance used with the 5-MHz pulse echo signals may be considered more suitable for the classification, since in its best case, it achieved 70% accuracy and in its worst case 53.75%, whereas the results of the Manhattan distance for the 4-MHz backscattered signals ranged from 57.5%-60% and for the 4-MHz pulse echo signals from 48.75%-63.75%. All rounds using the 4-MHz backscattered signals with the Euclidean distance achieved an accuracy rate of 57.5%.
In general, the classifications involving the 5-MHz backscattered signals were very unsatisfactory, with accuracy rates below 40%.  Table 8 shows that for the 4-MHz frequency backscattered signals, all distances at some time could accurately classify more than 10 samples as 0 h and 10 h classes, whereas only the Euclidean and Manhattan distances could do the same with the 100 h and 200 h classes.
In Table 9 is presented the confusion matrix of the classification means of the 4-MHz frequency pulse echo signals. It can be observed that only the Euclidean and Manhattan distances achieved more than 50% of classification for each class, whereas the Euclidean distance achieved a maximum of 63.75% and the Manhattan distance a maximum of 60%.
The data in Table 10 show that for the 5-MHz frequency backscattered signals, a poor performance was achieved in general. The Manhattan distance achieved a maximum of 45% and a minimum of 31.25% accuracy, whereas the Euclidean distance achieved a maximum and a minimum of 37.5% and 36.25%, respectively; the other ones kept below 35%.    Table 11 shows the confusion matrix of the classification means of the 5-MHz frequency pulse echo signals. The data presented show that the best performance for the temperature of 950 • C was achieved by the Euclidean distance with a maximum of 70%, followed by the Manhattan distance with 68.75%; the others remained between 57.5% and 22.5%. In Table 12 is shown the training and testing average times in milliseconds for the 4-and 5-MHz frequency signals. All distances achieved analogous results for the training and testing times for the backscattered signals, keeping a time of 0.2 ms for training and of 0.1 for testing. For the pulse echo signals, the average times range from 0.2-0.4 ms for training and 0.1 for testing.  Table 13, the best accuracy was achieved by the Euclidean distance, with a value of 65.86%, but the highest harmonic mean belongs to the Manhattan metric, with 83.5%. This is due to the fact that in some rounds, the classification performance achieved by the Manhattan distance was higher than the one achieved by the Euclidean distance. The best classification of the Manhattan distance was 71.4%, whereas the best accuracy achieved by the Euclidean distance was equal to 67.86%. The other distances achieved accuracy rates lower than 30%, with the chi-squared distance achieving in its worst classification a value of 10.71%.
Regarding the processing times, indicated in Table 14, the distances that classified the samples correctly more often were those that took longer to do the training, with the Manhattan distance taking between 0.5 and 1 ms and the Euclidean distance from 0.4-0.6 ms. For the test, the Manhattan distance took from 0.16-0.3 ms, whereas the Euclidean distance took from 0.18-0.24 ms. The pulse echo signals were those that took longer to train the classifier, taking around 0.8 ms for all distances.
For the 4-MHz backscattered signals, as shown in Table 15, the Manhattan distance could classify correctly many of the samples related to the signals associated with the temperature of 650 • C, with an average from 15-18.5 samples classified correctly, whereas for the signals associated with the temperature of 950 • C, it achieved an average of correct classifications from 7.5-11 samples. The same applies to the Euclidean distance, that classified correctly from 10-15 samples of the classes associated with the temperature of 650 • C and 7-13 samples for the classes related to the temperature of 950 • C. The remaining distances confused considerably the data associated with the 650 • C/200 h class, with an average of 11-18 samples in all classes classified as belonging to this class.    With the 4-MHz frequency pulse echo signals, Table 16, the classification was more confusing, with wrong classifications for all classes and distances. The Manhattan distance achieved an average accuracy in most of its classes from 8-13.5 samples, whereas the Euclidean distance achieved an average accuracy from 6.5-13 samples. The squared chi-squared and Canberra distances achieved similar results, with an average of samples classified correctly between 5 and 12.5, whereas the Bray-Curtis distance from 5-10.5 and the chi-squared distance below 10 samples.  Table 17 presents the result of the confusion matrix for the 5-MHz frequency backscattered signals, and it can be observed that all distances had bad performances; with the exception of the 650°C/10 h class, for which the Manhattan and Euclidean distances have the best accuracy rate of 42.14% and 40.71%, respectively. The distance for others had the best results below 30%.
From Table 18, which shows the confusion matrix for the 5-MHz frequency pulse echo signals, it can be confirmed that the performance of all distances was better than the ones presented in Table 17, but still lower than the ones shown in Table 15. The Manhattan distance achieved an average of classification varying from 9-15 correct samples per class, with a maximum accuracy of 64.29% and a minimum accuracy of 48.57%. The Euclidean distance achieved an average of accurate correct classification varying from 9-14.5 samples per class, with a maximum accuracy of 59.29% and a minimum accuracy of 47.14%. The squared chi-squared metric achieved a maximum accuracy of 47.86% and a minimum accuracy of 34.29%. The Canberra distance achieved a maximum accuracy of 36.42% and a minimum accuracy of 29.29%; the Bra-Curtis distance achieved a maximum accuracy of 30% and a minimum accuracy of 22.14%; and finally, the chi-squared distance achieved a maximum accuracy of 22.14% and a minimum accuracy of 11.43%.

Conclusions
This work evaluated the efficiency and efficacy of the OPF classifier configured with six distance functions to classify ultrasonic signals, raw data background echo and backscattered signals acquired at frequencies of 4 and 5 MHz, to characterize the phase transformations on a Nb-base alloy, thermally aged at 650 and 950°C for 10, 100 and 200 h, as well as in the as-welded condition.
In regard to this work, the following conclusions can be pointed out: (1) The results revealed that the classification of the ultrasonic signals using the OPF classifier was sensitive to the microstructural changes occurring in the inconel 625 alloy and that the formation of the secondary phases during the welding process, as well as the phase transformation kinetics due to the different thermal aging times can be efficiently identified; (2) The best accuracy rates for the thermal aging at 650°C were obtained using the OPF configured with the Manhattan distance on the backscattered signals acquired with a 4-MHz transducer (accuracy of 88.75%, in 0.384 ms, with a harmonic mean of 61.9); (3) For the thermal aging at 950°C, the best results were obtained using the OPF with Euclidean distance on the background echo signals acquired with the 5-MHz transducer (accuracy of 60%, in 0.569 ms, with a harmonic mean of 69.19); (4) For the thermal aging at 650 and 950°C, the best results were obtained using the OPF with the Euclidean distance on the backscattered signals acquired with the 4-MHz transducer (accuracy of 67.86%, in 0.713 ms, with a harmonic mean of 80.61); Based on the results obtained for accurately classifying the ultrasonic signals, it is possible to confirm that the OPF classifier is able to assess the aging conditions to which the inconel 625 alloy is submitted, making it possible to detect the best moment to carry out maintenance services, reducing the costs and maintenance time.