Identiﬁcation of Similar Seismic Waves Using the Phase-Only Correlation Function and Wavelet Transform

: Accurately determined acoustic emission (AE) locations provide signiﬁcant information on fracture systems, such as the orientation of fractures in a geothermal reservoir. To determine the relative source locations among a group of seismic events, similar AE waveforms must be detected and the relative arrival times of the P and S waves must be determined. In this paper, a method to identify similar AE waveforms is proposed, in which wavelet transform scalograms are used to determine the phase-only correlation function. The proposed method was applied to arbitrarily selected seismic waveforms, and its feasibility was evaluated by comparing the results with those obtained when the phase-only correlation function was obtained by using Fourier transform results.


Introduction
During acoustic emission (AE) measurements at geothermal fields, AE events with similar waveforms are often observed. An AE group with similar waveforms is called an "AE multiplet". AE multiplets are considered to comprise AE events occurring at a fracture with the same characteristics. The source location of these seismic events can be determined with a high accuracy by using a cross-correlation analysis [1], which allows for the strike and dip of the fault plane to be estimated. For instance, when the distance from the source locations to the observatories is about 1 km, the relative position of the source locations can be detected with an accuracy of several tens of centimeters. It makes sense from the principle of measurement that determining the relative relationship between the source locations is more accurate than determining the absolute position of the source locations. By estimating the geometrical relationship among a group of microearthquakes with similar waveforms, the fracture orientation and the migration of microearthquake source locations during hydraulic fracturing and fluid injection can be detected [2]. If several similar waveforms are observed and their source locations are determined to the order of tens of centimeters, the fracture surface that has been shear-slipped due to the increase in pore pressure can be estimated from the strike and dip of the plane defined by the distribution of the source locations. The direction of the pressure propagation due to water injection can also be estimated from the direction of the movement of the source location. The multiplet is useful because it is able to estimate not only the fracture distribution in the reservoir, but also the repeating moment release on a fracture plane, which would be related to fluid flow in the fracture, etc. [3][4][5][6][7]. Conventionally, similar waveforms have been identified by finding the mutual cross-correlation coefficient or a coherence function. The development of an automated efficient searching method for similar waveforms is needed for the evaluation of a fracture system using microearthquake waveforms, because their visual identification and classification are time consuming. An automated efficient searching method is also important in detecting slow-slip seismic events, which have recently been of interest for understanding the mechanical behavior of faults.
Image matching and pattern recognition are important data processing tools in geophysical fields, and several image matching and pattern recognition methods have been proposed for the identification of similar seismic events, for the detection of low-frequency earthquakes within a tremor, seismic pattern recognition, and ground-penetrating radar applications [8][9][10][11][12][13]. The source location of a group of similar seismic events (a multiplet) can be determined with a high resolution by using cross-correlation or cross-spectrum analyses. These techniques are widely used to locate faults and to determine the orientation of rock fractures in geothermal and petroleum fields [14][15][16], and microearthquake signal processing for subsurface measurements is increasing in importance (e.g., [17]) The phase-only correlation (POC) function is widely used in image data matching, such as for fingerprint recognition by automated teller machines (ATMs) [18]. The POC function can be used to recognize the similarity of two images with a high accuracy. In waveform data analysis, a method for detecting similar waveforms using the phase-only correlation function with a moving time window Fourier spectrum has been proposed [8,19]. This method is based on the new idea of transforming the waveform data to the time and frequency axes, and considering it as two-dimensional image data drawn on the time and frequency axes to calculate the phase-limited correlation function. Compared with conventional detection methods using correlation functions or a coherence function, this method evaluates the time variation of waveforms in both the time and frequency domains, which allow for small differences in the waveforms to be evaluated and the waveforms to be classified according to their similarity. With this method, however, the resolution of the spectrum in the time domain is limited by the length of the time window used for the Fourier transform; a comparatively longer time window length must be used to estimate the lower frequency components.
In this paper, a similar waveform detection method using the phase-only correlation function with wavelet transform, which enables spectral estimation with a relatively high time resolution, is investigated. Through the application to microseismic waveforms, it has been found that the proposed method can classify the waveforms more accurately and quantitatively than the FFT-based method previously proposed by the author. The proposed method, as an efficient automatic search method for similar waveforms, will contribute to the measurement of the geothermal reservoir structure by determining the source location of similar waveforms with a high accuracy.

Wavelet Transform
Wavelet transform, a time−frequency analysis method, has various geophysical applications [20][21][22][23][24][25]. In Fourier transform, the length of the analysis time window is fixed, regardless of whether high-or low-frequency components are being estimated. However, the length of the analysis time window used to estimate the spectrum should differ depending on the frequency component. In a moving time window Fourier transform, where the transform is performed while moving the analysis time window, the higher the frequency component, the lower the resolution in the time-axis direction. This is because the number of periods included in the time window increases when the frequency component is high. Different from Fourier transform, in wavelet transform, the length of the time window is changed during the calculation according to the frequency. In other words, the basis functions are scaled up or down so that the time resolution does not become lower at higher frequencies. Wavelet transform using a moving time window can thus more easily supplement the waveform changes in the time-axis direction than Fourier transform. Although a wavelet transform cannot overcome the uncertainty principle, it is well suited for the analysis of signals characterized by strong non-stationarity, such as microearthquake signals.

Phase-Only Correlation Function
Phase-only correlation of image data makes it possible to evaluate similarity. Phaseonly correlation is defined by the 2D inverse discrete Fourier transform (IDFT) of the combined phase R(k 1 , k 2 ) obtained from the 2D discrete Fourier transform (DFT) of two sets of image data, f (n 1 , n 2 ) and g(n 1 , n 2 ), as described by the following equations (e.g., [17]). where N 2 , and N 1 and N 2 denote the numbers of samples in the n1 and n2 directions, respectively. A F (k 1 , k 2 ) and A G (k 1 , k 2 ) denote the amplitudes of F(k 1 , k 2 ) and G(k 1 , k 2 ), respectively, and θ F (k 1 , k 2 ) and θ G (k 1 , k 2 ) are their respective phases. Then, the function R(k 1 , k 2 ) can be defined as follows: where G(k 1 , k 2 ) represents the complex conjugate of G(k 1 , k 2 ) and ∆θ(k 1 , The phase-only correlation function can then be represented by using inverse 2D IDFT, as follows where the value of r (n 1 , n 2 ) has a range from 0 to 1. The peak value of r (n 1 , n 2 ) provides a measure of the similarity of the two images, and the position of the peak shows the translational displacement between the two images.

Detection of Similar Seismic Events with the Phase-Only Correlation Function
The performance of the phase-only correlation method using a moving time window Fourier transform was evaluated previously [8]. In this study, the method with a wavelet transform was evaluated and then its performance was compared to that of the method using the Fourier transform. Figure 1 shows a flow chart of the calculation method for the phase-only correlation using wavelet transform. AE waveform time series data were used in the analysis. In this method, wavelet transform was used for the spectrum estimation instead of Fourier transform. The scalogram, which is a transformation of the signal calculated by the wavelet transform, is regarded as 2D image data; thus, an image matching using the POC function is applicable. Two arbitrarily selected waveforms were first transformed to scalograms by transforming the time series data to time and frequency domain data, allowing the data to be regarded as 2D image data. Next, 2D DFT was applied to each of the two scalograms, and the complex vectors, one of which needed to be a conjugate complex number, obtained from each scalogram were multiplied together. Then, the resulting normalized cross-spectrum was inverted in the time domain by using 2D IDFT. The value of the peak of the phase-only correlation function r (n 1 , n 2 ) represented the similarity of the two waveforms. Figure 2 shows the group of waveforms used to evaluate the performance of the proposed method. These waveforms were produced by microearthquakes that occurred at the lower end of an active fault in Sendai City, Miyagi Prefecture, at a depth of about 11 km, and they were detected by a triaxial seismometer in a borehole. The distance from the hypocenters to the detector was around 15 km. The waveforms were those of the vertical component signal, and the sampling frequency was 100 Hz. These waveforms of microearthquakes observed by a broad frequency band detector were used for the evaluation of the proposed method, because the P and S waves could be clearly read and it was easy to visually judge their similarity.   Figure 2 shows the group of waveforms used to evaluate the performance of the proposed method. These waveforms were produced by microearthquakes that occurred at the lower end of an active fault in Sendai City, Miyagi Prefecture, at a depth of about 11 km, and they were detected by a triaxial seismometer in a borehole. The distance from the hypocenters to the detector was around 15 km. The waveforms were those of the vertical component signal, and the sampling frequency was 100 Hz. These waveforms of microearthquakes observed by a broad frequency band detector were used for the evaluation of the proposed method, because the P and S waves could be clearly read and it was easy to visually judge their similarity.  Waveform numbers 1 to 5 were visually identified as a group of similar waveforms, whereas waveform numbers 6 to 10 were judged to be dissimilar waveforms. Two of the waveforms, which were processed by a band-pass filter with a bandwidth of 0.1 to 40 Hz to remove low-frequency fluctuations and power supply temperature extinction, used in the evaluation and the results of the wavelet transform are shown in Figures 3 and 4. A Morse wavelet with a symmetry parameter of 3 and a time-bandwidth product of 60 was used as the analyzing wavelet [20]. The wavelet transform result is shown as frequency magnitude changes on the scalogram; on each scalogram, frequency increases of 10 to 30 Hz and two S-wave peaks can be seen. It can be also seen that the frequency value on the scalogram changed as the wave energy changed with the arrival of the P and S waves. As Waveform numbers 1 to 5 were visually identified as a group of similar waveforms, whereas waveform numbers 6 to 10 were judged to be dissimilar waveforms. Two of the waveforms, which were processed by a band-pass filter with a bandwidth of 0.1 to 40 Hz to remove low-frequency fluctuations and power supply temperature extinction, used in the evaluation and the results of the wavelet transform are shown in Figures 3 and 4. A Morse wavelet with a symmetry parameter of 3 and a time-bandwidth product of 60 was used as the analyzing wavelet [20]. The wavelet transform result is shown as frequency magnitude changes on the scalogram; on each scalogram, frequency increases of 10 to 30 Hz and two S-wave peaks can be seen. It can be also seen that the frequency value on the scalogram changed as the wave energy changed with the arrival of the P and S waves. As the two waveforms were similar, the results of the wavelet transforms were also similar. The phase-limited correlation function result calculated using the scalograms in Figures 3 and 4 shows a sharp peak with a value of 0.38 ( Figure 5).     The wavelet transform does not compute the scalogram of the low-frequency components at the ends of the waveform in the time window. This region increased from the center of the time window to the end. If the energy of the waveform is concentrated in the low-frequency region, the scalogram at the lower frequency portion will be missing at both ends of the time window. In this sense, the sampling frequency and the frequency range of the dominant frequency components of the waveform are important. The influence of the sampling frequency on the estimation of the scalogram, as well as the detectability of this method, have not been investigated using real waveform data. In the presented paper, the main energy of the wave was distributed within 15-30 Hz under the sampling frequency of 100Hz. Therefore, the scalogram of that region could be estimated. In the analysis of this paper, for the part where the scalogram was not calculated, the value was set to zero and the POC was calculated.  Figures 3 and 4), where similarity was evaluated by visual inspection. The n1 and n2 axes correspond to the shift lags in the time-and frequency-axis directions, respectively. Figure 6 shows the waveform of a microearthquake that does not resemble the two waveforms in the above evaluation and the wavelet transform results. This waveform has a low signal-to-noise ratio, and the P and S waves are not clear. The phase-only correlation function result calculated between the scalograms shown in Figures 3 and 6 (i.e., two waveforms visually judged to be dissimilar) shows several peaks with low values (Figure 7). This result indicates that the similarity between these two waveforms is lower compared with that between the waveforms shown in Figures 3 and 4.

Application to a Group of Waveforms
The proposed method was applied to the group of waveforms shown in Figure 2 where the seismic events were induced for several months at a depth of about 10km jus below the Ohokura artificial lake. The analyses presented below were performed usin MATLAB. The POC function was calculated for all combinations of the two waveforms Figure 8 shows the matrix of the peak values calculated for each combination. The pea POC function value between waveform numbers 1 and 2 was 0.38, as described above The peak values calculated between the combinations of waveform numbers 1 to 5 wit The wavelet transform does not compute the scalogram of the low-frequency components at the ends of the waveform in the time window. This region increased from the center of the time window to the end. If the energy of the waveform is concentrated in the low-frequency region, the scalogram at the lower frequency portion will be missing at both ends of the time window. In this sense, the sampling frequency and the frequency range of the dominant frequency components of the waveform are important. The influence of the sampling frequency on the estimation of the scalogram, as well as the detectability of this method, have not been investigated using real waveform data. In the presented paper, the main energy of the wave was distributed within 15-30 Hz under the sampling frequency of 100Hz. Therefore, the scalogram of that region could be estimated. In the analysis of this paper, for the part where the scalogram was not calculated, the value was set to zero and the POC was calculated.

Application to a Group of Waveforms
The proposed method was applied to the group of waveforms shown in Figure 2, where the seismic events were induced for several months at a depth of about 10km just below the Ohokura artificial lake. The analyses presented below were performed using MATLAB. The POC function was calculated for all combinations of the two waveforms. Figure 8 shows the matrix of the peak values calculated for each combination. The peak POC function value between waveform numbers 1 and 2 was 0.38, as described above. The peak values calculated between the combinations of waveform numbers 1 to 5 with waveform numbers 6 to 10 had relatively lower values, and the peak values of the combinations of waveform numbers 6 to 10 also had low values. This result shows that a group of similar waveforms could be extracted from arbitrarily mixed waveform data. represented by the maximum values of the POC function. In these results, groups of similar and dissimilar waveforms are distinguished. Mutual distances are smaller between waveforms with larger peak POC function values, and waveforms with smaller mutual distances are grouped together. Figure 11 shows the results when the method was applied to 30 waveforms, including the 10 waveforms in Figure 2. It can be seen that similar waveforms with waveform numbers 1 to 5 were classified as one group. Thus, this method can detect similar waveforms even when the number of waveforms increases.   . These results show that using WT results in larger values of POC between similar waveforms than using FFT, and smaller values of POC when the waveforms are not similar. These results show that using WT resulted in larger values of POC between similar waveforms than using FFT, and smaller values of POC when the waveforms were not similar. This is because WT is highly time-resolved, so small changes in the waveform with respect to time are reflected in the scaogram, and small differences between the waveforms are also reflected in the POC value. In this way, POC using WT can detect more similar waveforms and classify them quantitatively.    Figure 10 shows the results of a "cluster analysis" conducted by the method used in [8], where an agglomerative hierarchical clustering was introduced and the Ward method was used to calculate the distance between the pairs of observables. The distance here is represented by the maximum values of the POC function. In these results, groups of similar and dissimilar waveforms are distinguished. Mutual distances are smaller between Energies 2021, 14, 4527 9 of 11 waveforms with larger peak POC function values, and waveforms with smaller mutual distances are grouped together. Figure 11 shows the results when the method was applied to 30 waveforms, including the 10 waveforms in Figure 2. It can be seen that similar waveforms with waveform numbers 1 to 5 were classified as one group. Thus, this method can detect similar waveforms even when the number of waveforms increases.          Figure 11. Waveform classification results by cluster analysis, where the method was applied to 30 waveforms.

Conclusions
A method for the detection of similar waveforms was developed, in which the phaseonly correlation function was determined by using scalograms obtained by wavelet transform. The proposed method was applied to a group of microearthquakes that included both similar and dissimilar waveforms. The result showed that the degree of similarity was expressed by the peak value of the phase-only correlation function. Thus, the waveforms could be classified into groups of similar or dissimilar waveforms, in the same way as in previous research, in which the phase-only correlation function was determined using Fourier transform results. Compared with the Fourier transform results, the scalograms calculated by the wavelet transform had a higher time resolution in the time-frequency domain, allowing small waveform changes to be expressed. As a result, it has become possible to judge the similarity and dissimilarity of waveforms more accurately and quantitatively. In addition, it has also been shown that the waveforms can be automatically classified into similar waveform groups when combined with the agglomerative hierarchical cluster analysis method. Therefore, this method would enable better classification of waveforms of microearthquakes induced for geothermal reservoir evaluation, as well as for the detection of slow-slip seismic events using a reference waveform to find similar slow-slip events.