Ultrasound Evaluation of the Primary α Phase Grain Size Based on Generative Adversarial Network

Because of the high cost of experimental data acquisition, the limited size of the sample set available when conducting tissue structure ultrasound evaluation can cause the evaluation model to have low accuracy. To address such a small-sample problem, the sample set size can be expanded by using virtual samples. In this study, an ultrasound evaluation method for the primary α phase grain size based on the generation of virtual samples by a generative adversarial network (GAN) was developed. TC25 titanium alloy forgings were treated as the research object. Virtual samples were generated by the GAN with a fully connected network of different sizes used as the generator and discriminator. A virtual sample screening mechanism was constructed to obtain the virtual sample set, taking the optimization rate as the validity criterion. Moreover, an ultrasound evaluation optimization problem was constructed with accuracy as the target. It was solved by using support vector machine regression to obtain the final ultrasound evaluation model. A benchmark function was adopted to verify the effectiveness of the method, and a series of experiments and comparison experiments were performed on the ultrasound evaluation model using test samples. The results show that the learning accuracy of the original small samples can be increased by effective virtual samples. The ultrasound evaluation model built based on the proposed method has a higher accuracy and better stability than other models.


Introduction
For large titanium alloy forgings, because of their high cost, the metallographic analysis method with accompanying test specimens and sample dissection, as well as full-coverage ultrasonic testing and noise wave height threshold evaluation, is usually used for quality inspection and testing [1][2][3]. Titanium alloys have a complex structure, and the noise wave height threshold method of ultrasonic inspection is based on a single judgment. Hence, it is difficult to meet the quality inspection requirements resulting from the upgrading of the aero-engine manufacturing process in China. To address this issue, quantitative methods for ultrasound evaluation of the titanium alloy metallographic structure are needed. TC25 titanium alloy is an α+β type heat-strength titanium alloy with good hightemperature strength, thermal stability, and corrosion resistance, and it is an ideal material for manufacturing aero-engines [4]. The most important microstructure parameters for the mechanical properties of dual-mode titanium alloys are the size of the primary α phase and its volume content, and the volume of the primary α phase is closely related to the size. Therefore, the primary α phase is the key metallographic organization of the dual-phase titanium alloy, and its quantitative detection can be used to evaluate the manufacturing quality of effectively [5].
The scattering of ultrasonic waves by the titanium alloy tissue structure is the foundation for performing the ultrasound evaluation. Blodgett [6] and Yang et al. [7,8] studied the

Definition of Virtual Samples
Let the original training sample set be (x, y), where x ∈ R n . A generation relationship K, produced by a priori knowledge, is used to generate some x vir . Then, a transformation relationshipĤ is defined to make y vir =Ĥ(x vir ), with the newly generated samples (x vir , y vir ) being called "virtual samples", where x vir and y vir are the input value and output value of virtual samples, respectively.
Virtual samples are divided into effective and invalid virtual samples. Only effective virtual samples can improve the small sample space [20] to increase the learning accuracy of the small sample set and reduce the prediction error of the model [21,22].
When building the virtual sample space, the prediction modelĤ, the prediction model of the actual small sample set, can be constructed through common machinelearning methods, such as multiple linear regression and SVM. Nevertheless, its establishment condition is thatĤ can be used to generate y, which corresponds to the input attribute x only when the mean absolute percentage error (MAPE) of the prediction model is less than 10% [23].

Introduction of Virtual Samples
Poggio et al. [16] first introduced the concept of virtual samples and provided relevant ideas for solving the small sample problem. Niyogi et al. [24] proved mathematically that virtual samples constructed from a priori knowledge can provide valid information as well. Subsequently, virtual sample generation technology has been applied widely in many fields, such as image recognition, soft measurement, medical diagnosis, and fault diagnosis [25][26][27][28][29]. Virtual sample generation (VSG) methods are generally classified into three categories.
Sampling-based VSG is a process of generating random variables. This process generally involves generating new examples from the specified distribution, such as using parameter distributions, including the Weibull distribution and Gaussian distribution, to improve the data distribution of multimodal small-sample datasets [30,31]. This method is often used to synthesize new samples in unbalanced datasets [32].
Information diffusion-based VSG is a method based on diffusion theory. This method is generally used to estimate the acceptable range of small-sample-set attributes, generate new samples through the diffusion function, and fill the sample information interval, so as to expand the sample set, such as mega-trend-diffusion [18] (MTD) and multi-distribution mega-trend-diffusion [19] (MD-MTD).
Deep-learning-based VSG is a method using a deep generative model. The core idea of this method is to simulate the data distribution of real samples to generate new false sample data. The common deep generative model is the generative adversarial network [33] (GAN).
The first method has a relatively low computational cost, but, if there is high correlation between variables, this method is invalid. The second method depends on the extended range of each attribute. Moreover, it is computationally intensive, and the overall effect of each attribute is prone to be ignored. Therefore, this method is difficult to use for highdimensional small datasets. The third method does not suffer from the disadvantages of the first two methods. In essence, it obtains the joint distribution between input and output variables from the historical dataset, establishes a mapping relationship between them and the real data distribution, and comprehensively considers the impact of all variables.

Virtual Sample Generation
According to the definition of virtual samples, the generation of virtual samples is divided into two processes. One is to generate x vir through generation relationship K, and the other is to establish a transformation relationshipĤ to make y vir =Ĥ(x vir ). Because the VSG method based on deep learning can solve the disadvantages of VSG based on sampling and information diffusion, in this study, the deep-learning-based generative model was used to generate virtual samples.
The GAN is a deep-learning model, and it is one of the most promising methods of unsupervised learning on complex distributions in recent years. It is composed of a generator (G) and a discriminator (D). The generator learns the real data distribution, takes the random noise z as the input, tries to fit the real data distribution P data , and the output is G z . The discriminator identifies whether the input sample is data G z generated from G or real data x. Its structure is shown in Figure 1. The GAN adopts a min-max objective to train two models with the fo objective function: The GAN adopts a min-max objective to train two models with the following objective function: The learning optimization method of GANs is to fix G and optimize D to maximize the discrimination accuracy of D. Then, D is fixed to optimize G to minimize the discrimination accuracy of D. If and only if P data = P g , the global optimal solution is reached, i.e., the generated data can be confused with the real data.
Assuming the real training sample set D s =(X, Y), the steps of generating the virtual sample set D vir = (X vir , Y vir )=G(D s , N vir ) using GAN are as follows.
(1) The real training sample set D s is preprocessed, and binary coding is performed on it. The accuracy of binary coding is expressed as where ∆x denotes coding accuracy, U min and U max refer to the range of decision variables, and k is the coding length. Then, the binary coded real training sample set D s is transformed into a black-andwhite image set, in which code 1 is displayed as "black" and code 0 is displayed as "white." (2) The black-and-white image set is used as the input of the GAN to train the GAN model. The GAN model is saved when P data = P g , and the saved GAN model is used to generate a virtual image set.
(3) The virtual image set is decoded and restored to the virtual sample set. The decoding formula is where b i denotes the code of the i-th bit, and x j is the decoded value.

Virtual Sample Validity Analysis and Screening Process
The current virtual sample generation technology can produce a large number of virtual samples, but the validity of the virtual samples is low, and their direct use cannot meet the requirements. Hence, it is necessary to analyze and screen their validity. Taking "the added virtual samples can improve the accuracy of the evaluation model" as the validity criterion, the screening basis is established to screen the generated virtual samples.
The optimization rate (OTR value) indicator is used to measure the validity of virtual samples. The higher its value, the higher the validity of the virtual samples. Equation (4) is the calculation formula.
where m is the number of real test samples, y i is the value of the real test samples,ŷ i is the predicted value of the real test samples, and MAPE is the mean absolute percentage error, which is used to measure the error of the prediction model. The smaller its value, the more accurate the prediction model. Here, MAPE 1 is the MAPE between the real test sample value and its corresponding predicted value without adding virtual samples.
Considering that the ultrasound evaluation model is to predict and evaluate the grain sizes of real samples, and the values of the real samples are limited by processing parameters, virtual samples beyond the range of real samples [X min , X max ] were screened out to avoid introducing unreasonable data.
When verifying the validity of virtual samples, each iteration generates a virtual sample set with a scale of N, in which the virtual sample set generated in the i-th iteration is D vir (i) = (X vir (i), Y vir (i))=G(D s , N), X vir (i) ∈ [X min , X max ]. Here, D f (i − 1) represents the ensemble of the real training sample set and the valid virtual sample set D vir (i − 1), . . . , D vir (1) generated in the previous iteration. When judging the validity of (1) is the optimal virtual sample set. The pseudo code of its effective virtual sample generation Algorithm 1 is as follows. /* initialize the number of iterations i, initialize reconstructed sample set D f (0), initialize N vir , initialize the virtual sample set after initial screening D vir */ 3.

Ultrasonic Testing Experiment and Metallographic Observation Experiment
TC25 titanium alloy has good high-temperature strength and thermal stability, which makes it an ideal material for aero-engines. Figure 2 is the process diagram of the entire ultrasonic testing test and metallographic observation experiment. The samples in this experiment are titanium alloy ring forgings produced by different process standards. The ring forgings were cut into 168 samples. Metallographic observation samples were prepared, and MR5000 inverted metallographic microscope was used for metallographic observation. More than 20 metallographic images were randomly selected on each sample. Typical metallographic images were shown in Figure 3. All metallographic images of each sample were measured by a digital image processing software, Image J, to obtain the equivalent diameter, area ratio, and grain length/minor axis ratio of all intact primary α phases within the field of view.
For ultrasonic testing, the test surface of each sample is evenly divided into 6 sampling areas of 5 mm × 5 mm, and the ultrasonic testing of the sample is carried out by the contact longitudinal wave echo method. The instruments and equipment used are an Olympus 5077PR pulse generator, a 10 MHz Olympus single crystal straight probe V112-RM, and a Pico Scope 3000 series acquisition card. The size of the probe wafer is larger than the sampling area, which ensures the full coverage of the sample. The nonlinear component of the ultrasonic signals produced by the specimens was acquired using the P-wave collinear harmonic method and a RAM-5000-SNAP (RITEC Inc., Milwaukee, WI, USA) non-linear transmitting and receiving transducers were 2.5 MHz and 5 MHz, respectively. TC25 experimental material contains 168 effective samples, and each samp contains five ultrasonic characteristic values (mean sound velocity, mean attenuatio primary offset, secondary offset, and nonlinear coefficient) and one primary α pha grain size value. For the 168 effective samples, the K-fold cross-validation method used to test the model accuracy, in which K is 3, i.e., 118 effective samples are random selected as the training set and the remaining 50 effective samples are used as the te set.   TC25 experimental material contains 168 effective samples, and each sample contains five ultrasonic characteristic values (mean sound velocity, mean attenuation, primary offset, secondary offset, and nonlinear coefficient) and one primary α phase grain size value. For the 168 effective samples, the K-fold cross-validation method is used to test the model accuracy, in which K is 3, i.e., 118 effective samples are randomly selected as the training set and the remaining 50 effective samples are used as the test set.

Constructing the Ultrasound Evaluation Model of the Grain Size of Primary α Phase
Each valid sample contains five ultrasonic characteristic values (mean sound velocity C L , mean attenuation α, primary offsetÂ F1 , secondary offsetÂ F2 , and nonlinear coefficient β) and the grain size value D of one primary α phase. Table 1 shows the partial original data of TC25 materials.
The set composed of five ultrasonic eigenvalues is taken as the input of real samples and expressed in the form of matrix X T = (X 1 , Here, n represents the number of samples X T , r denotes the dimension of a single sample, and r = 5 here. The grain size value of one primary α phase is taken as the output of the real samples and expressed as the matrix Y T = (y 1 , y 2 , . . . , y n ). The two together construct a real sample set D T = (X T , Y T ) of TC25 titanium alloy. Because the magnitude of each ultrasonic eigenvalue in the real sample set is different, they are normalized to a unified interval.
The construction of the model is divided into the virtual sample generation process, the virtual sample validity analysis and screening process, and the modeling process. The detailed description is as follows.   Table 1 shows the partial original data of TC25 materials.   (1) Virtual sample generation process.
The TC25 real training sample set is preprocessed and normalized to the [0, 100] interval to ensure the coding accuracy ∆x = 0.001. The coding length k = 17 is then obtained. A black-and-white image with a scale of 6 × 17 can be obtained by binary encoding of a valid sample. The image is shown in x (Figure 4), which is the real image data after binary encoding. After preprocessing, N s black-and-white image sets D s can be obtained. Here, N s is the sample set size of the real training sample set D s .
The TC25 real training sample set is preprocessed and normalized to the [0, interval to ensure the coding accuracy (2) Virtual sample validity analysis and screening process.
The optimal virtual sample set is generated by the EVSG algorithm,  The virtual sample image set D vir = (X vir , Y vir )=G(D s , N vir ) is generated by the GAN, where N vir is the number of a group of virtual sample image sets. The generated virtual sample image set is decoded and restored to a virtual sample set. Figure 4 demonstrates the specific GAN framework, where G z is an output image of generator G.
(2) Virtual sample validity analysis and screening process.
The optimal virtual sample set is generated by the EVSG algorithm, i.e., D vir = EVSG(D s ).
The real training sample set D s and the optimal virtual sample set D vir are combined to form a new reconstructed sample set D f , where D f =D s ∪ D vir . Using D f , the support vector regression method was adopted to build the ultrasound evaluation model of the primary α phase grain size.
To verify the prediction model, the prediction accuracy of the real test samples in the model is set, i.e., the test accuracy value R 2 is the indicator to test the quality of the model. The closer the value is to 1, the better the model. The formula is where m is the number of real test samples, y i is the value of the real test samples, y is the average value of the real test samples, andŷ i is the predicted value of the real test samples. The OTR value and R 2 value of the model are calculated, and the three processes are repeated M times. The average values of the OTR and R 2 are calculated.

Experiment
This experiment was divided into two parts. In the first part, a standard function was used to verify the effectiveness of the proposed method. In the second part, the ultrasound evaluation model of the TC25 primary α phase grain size generated based on the GAN was constructed. The selected datasets were studied in detail, and the experimental results were analyzed.

Data Description
To verify the effectiveness of the method, a set of benchmark functions was used in the simulation. In this study, a nonlinear benchmark function is considered as x 5 ] T conforms to the uniform distribution of (0,1). Fifty data points were selected as the original dataset, which was divided into a training dataset (30 samples) and a test dataset (20 samples). Table 2 shows some selected data points.

Parameter Selection
The binary coded training dataset is used to train a GAN to produce reasonable virtual samples. The architecture of the GAN is described in detail below. The generator is a five-layer neural network. Its input is a one-dimensional vector with a length of 100. The three hidden layers are multilayer perceptrons (MLPs) with 512 neurons in each MLP. The generator uses the leaky rectified linear unit (ReLU) as the activation function for all layers except for the last layer, which uses the tanh function for activation. The discriminator uses a four-layer neural network, and the two hidden layers are MLPs with 60 neurons in each MLP. The discriminator uses leaky ReLU as the activation function except that the last layer is activated by the sigmoid function. The generator and discriminator both use the Adam optimizer, and the learning rate of the optimizer is 0.0002.

Analysis of Effectiveness
A standard dataset was introduced to verify the effectiveness of this method. First, the SVM regression model was constructed by using 30 original sample data. Then, the evaluation model based on the GAN was built by employing this method. Finally, the evaluation models established by virtual sample generation methods based on MTD and MD-MTD were compared. Figure 5 shows the distribution curve of the real and predicted test sample values in the evaluation model with and without virtual samples. As Figure 5    Effective virtual samples can fill the information interval between the o sample points and expand the limited and insufficient sample data to enha generalization ability of learning and reduce the error of the model.    Effective virtual samples can fill the information interval between the original sample points and expand the limited and insufficient sample data to enhance the generalization ability of learning and reduce the error of the model.

Impact of Virtual Sample Number on Ultrasound Evaluation Model
The virtual sample set is generated by multiple iterations of the GAN, and the impact of the virtual sample number on the evaluation model is observed. Figure 6 shows the curve graph of R 2 and OTR value of the real training samples in the reconstructed sample set D f (i)(i = 1, 2, . . .) (N is 5) as the number of iterations increases.

Impact of Virtual Sample Number on Ultrasound Evaluation Model
The virtual sample set is generated by multiple iterations of the GAN, and the impact of the virtual sample number on the evaluation model is observed. Figure 6 shows the curve graph of 2 R and OTR value of the real training samples in the The OTR value curve indicates that an optimal virtual sample number exists, i.e., there is an optimal subset of virtual samples, which makes the evaluation effect of the ultrasound evaluation model the best.   Figure 7a shows the radial visualization, and Figure  7b shows the star coordinate visualization. The red virtual sample points are evenly distributed within the blue sample point area. As Figure 6 shows, with the increase in virtual sample number N vir , the curves of both R 2 and OTR fluctuate with approximately the same trend. When N vir = 35, both R 2 and OTR reach their maximum value: R 2 is 0.834, and OTR is 36.838%. Therefore, the optimal virtual sample number N vir = 35, and the corresponding model has the best performance.
The OTR value curve indicates that an optimal virtual sample number exists, i.e., there is an optimal subset of virtual samples, which makes the evaluation effect of the ultrasound evaluation model the best. Figure 7 is a visual diagram of the distribution of virtual sample data and real data generated by the GAN when N vir = 35. The red point is the virtual sample point, and the blue point is the real sample point. Figure 7a shows the radial visualization, and Figure 7b shows the star coordinate visualization. The red virtual sample points are evenly distributed within the blue sample point area.   For the different virtual sample generation methods, the distribution of test values and predicted test sample values in the ultrasound evaluation model is sh Figure 8.   The proposed method was compared with traditional ultrasound eva models, such as the single ultrasonic parameter evaluation model, through a s curve fitting and multi-ultrasonic parameter evaluation models obtained by d machine-learning methods.
The least-squares method was used to establish the ultrasound evaluation m a single ultrasonic characteristic parameter (sound velocity) and grain size. H represents the first-order sound velocity model, and * 1 Γ denotes the second-orde velocity model. The fitting curve of the single ultrasonic characteristic evaluation is shown in Figure 9.

Comparison with Traditional Ultrasound Evaluation Model
The proposed method was compared with traditional ultrasound evaluation models, such as the single ultrasonic parameter evaluation model, through a series of curve fitting and multi-ultrasonic parameter evaluation models obtained by different machinelearning methods.
The least-squares method was used to establish the ultrasound evaluation model of a single ultrasonic characteristic parameter (sound velocity) and grain size. Here, Γ 1 represents the first-order sound velocity model, and Γ * 1 denotes the second-order sound velocity model. The fitting curve of the single ultrasonic characteristic evaluation model is shown in Figure 9. The ultrasound evaluation model of multi-ultrasonic features and grain size was constructed by a multiple linear regression method and compared with the proposed method. Figure 10 is a contrast curve between the test sample values and predicted values of the test samples in the multi-ultrasonic feature evaluation model. As Figure 10 shows, the curves of the multiple linear regression ultrasonic evaluation model and SVM  Figure 9 shows that a large gap exists between the fitting value of the first-order linear model and the real training value. The second-order linear model fits better than the first-order one.
The ultrasound evaluation model of multi-ultrasonic features and grain size was constructed by a multiple linear regression method and compared with the proposed method. Figure 10 is a contrast curve between the test sample values and predicted values of the test samples in the multi-ultrasonic feature evaluation model. As Figure 10 shows, the curves of the multiple linear regression ultrasonic evaluation model and SVM ultrasonic evaluation model have large fluctuations, whereas the predicted value of the proposed method is closest to the real value of the test sample, and the curve has a small fluctuation range. The ultrasound evaluation model of multi-ultrasonic features and grain size was constructed by a multiple linear regression method and compared with the proposed method. Figure 10 is a contrast curve between the test sample values and predicted values of the test samples in the multi-ultrasonic feature evaluation model. As Figure 10 shows, the curves of the multiple linear regression ultrasonic evaluation model and SVM ultrasonic evaluation model have large fluctuations, whereas the predicted value of the proposed method is closest to the real value of the test sample, and the curve has a small fluctuation range.  Table 5 shows the MAPE values of different ultrasound evaluation models. The single ultrasound evaluation model has a large MAPE value, and its evaluation effect is far lower than that of the proposed method. Hence, compared with the single ultrasonic evaluation model, the proposed method has a better effect. The MAPE value of the multi-ultrasound evaluation model is smaller than that of single ultrasound evaluation model but is significantly higher than that of the proposed method. Therefore, the proposed method is superior to the traditional multi-ultrasonic parameter model.
Compared with the four traditional ultrasound evaluation models, the proposed model has a higher training accuracy, smaller MAPE, and greater stability.

Evaluation Methods MAPE (%)
First-order sound velocity method 9.814 Figure 10. Contrast curve graph between the test sample value and the predicted value of the test sample in the multi-ultrasonic feature evaluation model. Table 5 shows the MAPE values of different ultrasound evaluation models. The single ultrasound evaluation model has a large MAPE value, and its evaluation effect is far lower than that of the proposed method. Hence, compared with the single ultrasonic evaluation model, the proposed method has a better effect. The MAPE value of the multiultrasound evaluation model is smaller than that of single ultrasound evaluation model but is significantly higher than that of the proposed method. Therefore, the proposed method is superior to the traditional multi-ultrasonic parameter model. Compared with the four traditional ultrasound evaluation models, the proposed model has a higher training accuracy, smaller MAPE, and greater stability.

Conclusions
Small samples have low modeling accuracy because of the limited scale in the ultrasonic evaluation experiment of tissue structure. To address this problem, a method of ultrasound evaluation of the primary α phase grain size based on virtual samples generated by a GAN network was proposed. TC25 titanium alloy forgings were the research object. The experimental conclusions are as follows.
(1) A GAN network was used to generate virtual samples, a virtual sample screening mechanism was introduced, and an ultrasound evaluation model was constructed using SVM regression. The experimental results show that the model has high accuracy and a small error.
(2) The inclusion of virtual samples can better address small-sample problems, such as insufficient sample information or a small number of samples.
(3) Compared with the traditional ultrasound evaluation model, the ultrasound evaluation method with the addition of virtual samples can improve the learning accuracy of the original small samples. Compared with evaluation models constructed using the MTD and MD-MTD virtual sample generation methods, the prediction data obtained by the proposed method are closer to the real sample values. Hence, the ultrasound evaluation model of the primary α phase grain size based on the virtual sample generation by a GAN network has a higher accuracy, a more-stable performance, and less error than other models.