1. Introduction
Seismic refraction data analysis is one of the principal methods for near-surface modeling [
1,
2,
3,
4]. A critical step of the method is first-arrival picking for direct and head waves. It influences the effectiveness of many steps such as static correction [
5,
6] and velocity modeling [
7]. Misidentifications of these arrival times may have significant effects on the hypocenters [
8]. However, the raw seismic traces are always contaminated by strong background noise with complex near-surface conditions [
7]. The main challenge is to accurately extract the first arrivals under the noise interference [
9,
10] and irregular topography [
11]. According to Akram and Eaton [
8], there is an urgent need for automatic picking methods as the scale of seismic data continues to grow.
There are three main types of first-arrival picking methods. The first is
Coppens’ method [
12] and its variants. It uses energy ratios within two amplitude windows to process the data [
12]. Al-Ghamdi and Saeed [
13] improved this method by using adaptive thresholds. The multi-window algorithm [
14] uses three moving windows instead. Moreover, it distinguishes signals from noise using the average of the absolute amplitudes in each window. Sabbione and Velis [
15] used a modified form of
Coppens’ method along with entropy and fractal-dimension methods to pick first-arrival travel times. The second is the direct correlation method [
16]. The direct correlation method was proposed by Molyneux and Schmitt [
16]. It uses the maximum cross correlation value as a criterion [
16], which fails in data sets with low signal-to-noise ratio (S/N). The third is the backpropagation neural networks method [
17]. It applies backpropagation neural networks in first-arrival refraction event picking and seismic data trace editing [
17].
Recently, a new algorithm based on fuzzy c-means is proposed to deal with low signal-to-noise ratio data [
18]. It divides microseismic data into two clusters according to the different levels of similarity between the signals and noise. Thus, the initial time of the signal cluster is regarded as the first-arrival time. Others have reported many automatic picking schemes such as digital image segmentation [
19], STA/LTA method [
20,
21], Akaike information criterion [
22], fractal-based algorithm [
23] and TDEE [
11]. However, the detection accuracy of existing algorithms is still unsatisfactory.
In this paper, we propose the first-arrival picking through sliding windows and fuzzy c-means (FPSF) algorithm.
Figure 1 illustrates the overall structure of our algorithm through an example. In the range detection stage, each trace is processed with a vertical sliding window to seek the first-arrival interval of each trace. Then, the horizontal window is employed to adjust the window for each trace. In this way, the first-arrival range is identified. In the first-arrival travel times picking stage, a particle swarm optimization (PSO) algorithm seeks original cluster centers. Finally, a fuzzy c-means (FCM) picks the first-arrival travel times.
FPSF presents two new features to handle the challenge mentioned earlier. One is to introduce a range detection stage before first-arrival picking. We design a range detection technique using sliding windows on vertical and horizontal directions. On the one hand, the energy of single trace will abruptly shift in the first-arrival interval. Hence, a vertical sliding window is employed for keeping track of inner-trace change. The quality of the window is measured by the difference between its upper and lower parts, and its position in the trace. It finds the interval where the energy values suddenly shift the most. On the other hand, the first-arrival travel times of adjacent traces are approximate. Hence, a horizontal sliding window is employed for keeping track of inter-trace change. It adjusts the locations of vertical windows to ensure their similarities. All vertical windows of each trace consist of first-arrival range. With this technique, the data size is decreased dramatically, and the accuracy can be improved.
The other is to employ PSO and FCM for clustering seismic data. FCM is successful in image processing, so the application to our data is expected [
18]. The data is restricted to the first-arrival range. First, an improved particle swarm optimization is used to determine original cluster centers. Second, an improved fuzzy c-means which has original cluster centers will pick up first-arrival travel times among the range.
Experiments are undertaken on two field data sets. We compare FPSF with some methods including modified Coppens’ method (MCM) [
15], the direct correlation method (DC) [
16], and backpropagation neural networks method (BNN) [
17]. Results show that FPSF is accurate. FPSF can be used in many domains such as image processing and seismic data processing.
The rest of the paper is organized as follows. In
Section 2, we review some related works. In
Section 3, we build a data model, define the first-arrival picking problem and introduce some concepts. In
Section 4, we elaborate on the principle of the new method of this paper. In the sequel, two field data sets are experimented to verify the effectiveness of the method in
Section 5. Finally,
Section 6 summarizes this paper.
2. Related Works
The history of seismic refraction data analysis can be traced back to the 1920s [
6]. Seismic refraction data analysis tasks include deconvolution [
24], dynamic correction [
25], static correction [
26,
27], speed analysis [
28], and migration [
29]. Picking first arrivals [
19] is an important pre-processing stage for these tasks. For example, the effectiveness of static corrections depends on the precise of the first arrivals [
6,
15].
There are three strategies in the development of picking first arrivals. The manual strategy relies solely on the experts, therefore it is time-consuming and occasionally inaccurate [
15,
19,
23,
30]. To make the matter worse, this strategy can lead to biased and inconsistent picks because it relies on the subjectivity of the selection operator [
15].
The man–machine interaction strategy provides experts with software for visual inspection [
19,
23,
30]. The expert should identify a few first arrivals, and then the software will pick the others. In case of some difficult situations, the expert interfere with the process. Naturally, this strategy is more efficient. However, the whole procedure is still very time consuming and subjective.
The automatic strategy [
15,
16,
23] aims to provide more efficient and intelligent solution. It requires the development of advanced machine learning and data mining algorithms. Note that this strategy does not prevent experts from intervening. Experts should check the result and correct it if necessary. Naturally, if the algorithm works well, manual intervention is rare.
Currently, there are many well-known seismic data processing systems, such as Promax [
31], CGG (refering to wikipedia), Focus and Grisys [
32]. They all contain the key step of picking first arrivals. Affected by data quality and parameter setting, the results of each software program are very different. Therefore, an accurate, efficient and stable algorithm for this problem is needed.
4. The Proposed Algorithm
The algorithm framework has been illustrated in
Figure 1. The range detection stage is composed of vertical window sliding and horizontal window sliding. The first-arrival picking stage is composed of PSO and FCM. This section describes each stage in detail.
4.1. Range Detection
This subsection explains the range detection stage with a vertical sliding window and a horizontal sliding window.
We apply a vertical sliding window to capture the energy which is large, early and shift abrupt in each trace. Let the window size be
l and the starting index of the current window of the j-th trace be
. We design the following optimization objective function:
where
a is the energy ratio weight. Here, the first part expresses the ratio between the upper and lower part of the window. The smaller the value, the larger the shift. The second part expresses the evaluation of the position of the window. The smaller the value, the earlier the travel time. The weight
a is used to obtain a trade-off between these two values.
Algorithm 1 lists the vertical window sliding process. Line 1 initializes the minimal value of the object function value
. Lines 2 to 10 show the process of sliding a vertical window with a step size of
k. Line 3 calculates the sum of the upper part of the window
. Line 4 calculates the sum of the lower part of the window
. Line 5 computes object function value of the current window according to Equation (
4). Lines 6 to 9 determine if the update condition has been reached. If so, update the minimal value of the object function value
and the starting index of the vertical window
in Lines 7 and 8.
Algorithm 1: Vertical Window Sliding for One Trace |
Input: The j-th trace , window size l, ratio weight a and search step size k. |
Output: The starting index of the result window . |
Method: verticalSliding. |
1: | ; // Initialize |
2: | for ( step k to ) do |
|
3: | ; // The sum of the upper part |
4: | ; // The sum of the lower part |
5: | ; // Compute r |
6: | if () then |
|
7: | ; |
8: | ; // Update the starting index |
9: | end if |
10: | end for |
11: | return; |
We apply a horizontal window to adjust the neighboring first-arrival intervals determined by the vertical windows. Median filtering is employed to smooth the first-arrival intervals in the window to ensure their similarity.
Definition 2. First-arrival range matrix is an matrix , where l is the size of vertical window and n is the number of traces. It saves the first-arrival range including first arrivals. This matrix stores the range of the first arrivals. The size of the original data set S has been reduced from to .
Algorithm 2 lists the horizontal window sliding process. Lines 1 to 9 show the process of sliding a horizontal window. Line 2 moves the horizontal window in step size b. Line 3 obtains the median of the window m. Lines 4 to 7 determine whether the difference between each element in the window and the median is too large. If so, update this value with a large difference to the median in Line 6.
After the above steps, range detection stage has been completed and the first-arrival range expressed by the first-arrival range matrix (
R) has been confirmed. This is one kind of dimensionality reduction techniques [
47] and data size is directly reduced by 90%. Just as pre-processing can help increase the accuracy [
48], range detection stage can be viewed as a pre-processing stage.
Algorithm 2: Horizontal Window Sliding |
Input: The starting index array of result windows , vertical window size l and horizontal window size b. |
Output: Range starting index array . |
Method: horizontalSliding. |
1: | for ( to ) do |
|
2: | ; // Move the horizontal window |
3: | ; // Get the median of the window |
4: | for ( to b) do |
|
5: | if () then |
|
6: | ; // Update the value with a large difference |
7: | end if |
8: | end for |
9: | end for |
10: | return; |
4.2. First-Arrival Picking from the Range
This subsection explains the first-arrival picking stage with PSO and FCM. The data field at this stage is the first-arrival range confirmed by range detection stage.
We employ PSO to find the original clustering centers of FCM according to the advantages of PSO including global optimization and fast convergence [
49]. Specifically, we use the following fitness function:
where
and
are the parameters of fitness function with the constraint
. The
is the objective function of the FCM clustering method we employed. The parameter
was usually proposed as 2 [
39].
The particle swarm velocity iterative update formula is Equation (
2). The particle swarm position iterative update formula is Equation (
3).
Definition 3. Boundaries are represented by a matrix , where is the lower bound, and is the respective upper bound.
Here, we have two boundaries, the position boundary and the velocity boundary. Let be the position boundary, and let be the velocity boundary.
Algorithm 3 lists the process of particle swarm optimization. Lines 1 to 4 initialize each of the particle’s position
and velocity
with random values. Line 5 initializes iteration times
t. Lines 6 to 11 calculate the fitness function and record best solution of each particle according to Equation (
5). Line 9 records best solution of each particle itself. Line 12 finds the optimal particle. Line 14 updates the global optimal particle. Lines 16 to 21 update velocity and position of each particle. Line 17 updates the velocity of each particle according to Equation (
2). Lines 18 and 20 determine whether the velocity and position are out of boundaries. Line 19 updates the position of each particle according to Equation (
3).
We employ FCM to pick first arrivals according to the similarity of the first-arrival energy values of adjacent traces. The fuzzy c-means algorithm iteratively calculates on the seismic data set to obtain the clustering center that minimize the objective function.
Algorithm 3: PSO |
Input: The fitness function f, the matrices of position boundary and velocity boundary , the number of particles M, the inertia weight of each particle’s velocity , the global influence weight , the inertia weight w, the maximum iteration times T and the convergent error . |
Output: Solution of the best particle . |
Method: particleSwarmOptimization. |
1: | for ( to M) do |
|
2: | ; // Initialize position and velocity |
3: | ; |
4: | end for |
5: | ; // Initialize iteration times t |
6: | while ( && ) do |
|
7: | for ( to M) do |
|
8: | if () then |
|
9: | ; // Record optimal solution of each particle |
10: | end if |
11: | end for |
12: | ; // Find optimal particle |
13: | if () then |
|
14: | ; // Update global optimal particle |
15: | end if |
16: | for ( to M) do |
|
17: | ; // Update particle velocity |
18: | ; // Check and adjust |
19: | ; // Update particle position . |
20: | ; // Check and adjust |
21: | end for |
22: | ; |
23: | end while |
24: | return; |
Definition 4. First-arrival range matrix is an matrix , where l is the height of the first-arrival range, n is the number of traces, e is the number of clustering centers, and is the membership degree of belonging to the k-th cluster.
We use the following objective function:
where
is the fuzzy indicator,
is the distance between
and
, and
is the center of the
k-th cluster.
The membership degree is updated according to
The clustering center is updated according to
Algorithm 4 lists the process of fuzzy c-means. Line 1 initializes membership matrix
U. Line 2 computes clustering objective function value
J according to Equation (
6). Lines 3 to 6 iteratively update membership matrix
U and clustering center array
. Line 4 updates membership matrix
U according to Equation (
7). Line 5 updates clustering center array
according to Equation (
8).
Algorithm 4: FCM |
Input: Original clustering center array , the first-arrival range matrix R, the number of clusters e, the fuzzy indicator and the convergent error . |
Output: Membership matrix U and clustering center array . |
Method: fuzzyClusterMethod. |
1: | ; // // Initialize membership matrix U |
2: | ; // Compute function value J according to Equation (6) |
3: | while (!) do |
|
4: | ; // Update U according to Equation (7) |
5: | ; // Update according to Equation (8) |
6: | end while// Check the convergence |
7: | return; |
After the FCM processing, e clustering centers can be fixed. The data is divided into 10 classes and one of the classes is the result of first-arrival picking. After the above steps, the first-arrival picking stage has been completed and the first-arrival travel times have been confirmed.
5. Experimental Results
This section shows the experimental results with two data sets.
Figure 3a shows the field microseismic data consists of 280 shots from
, China. Every shot has about 400 traces and time sampling interval is 2 ms.
Figure 3b shows the result of range detection.
Figure 3c shows the result of FPSF.
Figure 4a shows the field microseismic data consists of 150 shots from
, China. Every shot has about 500 traces and time sampling interval is 2 ms.
Figure 4b shows the result of range detection.
Figure 4c shows the result of FPSF.
Figure 5a shows
Xinjiang field microseismic data.
Figure 5b shows the comparison of the difference among the values by the MCM method (purple spots), BNN method (blue spots), DC method (green spots) and FPSF (red spots).
Table 2 shows the accuracy of each method for different data sets. We can find out that FPSF is more accurate than BNN, DC on the two data sets and MCM on one data set. In general, FPSF shows superiority over BNN, DC and MCM on the two data sets.