First-Arrival Travel Times Picking through Sliding Windows and Fuzzy C-Means

: First-arrival picking is a critical step in seismic data processing. This paper proposes the ﬁrst-arrival picking through sliding windows and fuzzy c-means (FPSF) algorithm with two stages. The ﬁrst stage detects a range using sliding windows on vertical and horizontal directions. The second stage obtains the ﬁrst-arrival travel times from the range using fuzzy c-means coupled with particle swarm optimization. Results on both noisy and preprocessed ﬁeld data show that the FPSF algorithm is more accurate than classical methods.


Introduction
Seismic refraction data analysis is one of the principal methods for near-surface modeling [1][2][3][4].A critical step of the method is first-arrival picking for direct and head waves.It influences the effectiveness of many steps such as static correction [5,6] and velocity modeling [7].Misidentifications of these arrival times may have significant effects on the hypocenters [8].However, the raw seismic traces are always contaminated by strong background noise with complex near-surface conditions [7].The main challenge is to accurately extract the first arrivals under the noise interference [9,10] and irregular topography [11].According to Akram and Eaton [8], there is an urgent need for automatic picking methods as the scale of seismic data continues to grow.
There are three main types of first-arrival picking methods.The first is Coppens' method [12] and its variants.It uses energy ratios within two amplitude windows to process the data [12].Al-Ghamdi and Saeed [13] improved this method by using adaptive thresholds.The multi-window algorithm [14] uses three moving windows instead.Moreover, it distinguishes signals from noise using the average of the absolute amplitudes in each window.Sabbione and Velis [15] used a modified form of Coppens' method along with entropy and fractal-dimension methods to pick first-arrival travel times.The second is the direct correlation method [16].The direct correlation method was proposed by Molyneux and Schmitt [16].It uses the maximum cross correlation value as a criterion [16], which fails in data sets with low signal-to-noise ratio (S/N).The third is the backpropagation neural networks method [17].It applies backpropagation neural networks in first-arrival refraction event picking and seismic data trace editing [17].
Recently, a new algorithm based on fuzzy c-means is proposed to deal with low signal-to-noise ratio data [18].It divides microseismic data into two clusters according to the different levels of similarity between the signals and noise.Thus, the initial time of the signal cluster is regarded as the first-arrival time.Others have reported many automatic picking schemes such as digital image segmentation [19], STA/LTA method [20,21], Akaike information criterion [22], fractal-based algorithm [23] and TDEE [11].However, the detection accuracy of existing algorithms is still unsatisfactory.
In this paper, we propose the first-arrival picking through sliding windows and fuzzy c-means (FPSF) algorithm.Figure 1 illustrates the overall structure of our algorithm through an example.In the range detection stage, each trace is processed with a vertical sliding window to seek the first-arrival interval of each trace.Then, the horizontal window is employed to adjust the window for each trace.In this way, the first-arrival range is identified.In the first-arrival travel times picking stage, a particle swarm optimization (PSO) algorithm seeks original cluster centers.Finally, a fuzzy c-means (FCM) picks the first-arrival travel times.FPSF presents two new features to handle the challenge mentioned earlier.One is to introduce a range detection stage before first-arrival picking.We design a range detection technique using sliding windows on vertical and horizontal directions.On the one hand, the energy of single trace will abruptly shift in the first-arrival interval.Hence, a vertical sliding window is employed for keeping track of inner-trace change.The quality of the window is measured by the difference between its upper and lower parts, and its position in the trace.It finds the interval where the energy values suddenly shift the most.On the other hand, the first-arrival travel times of adjacent traces are approximate.Hence, a horizontal sliding window is employed for keeping track of inter-trace change.It adjusts the locations of vertical windows to ensure their similarities.All vertical windows of each trace consist of first-arrival range.With this technique, the data size is decreased dramatically, and the accuracy can be improved.
The other is to employ PSO and FCM for clustering seismic data.FCM is successful in image processing, so the application to our data is expected [18].The data is restricted to the first-arrival range.First, an improved particle swarm optimization is used to determine original cluster centers.Second, an improved fuzzy c-means which has original cluster centers will pick up first-arrival travel times among the range.
Experiments are undertaken on two field data sets.We compare FPSF with some methods including modified Coppens' method (MCM) [15], the direct correlation method (DC) [16], and backpropagation neural networks method (BNN) [17].Results show that FPSF is accurate.FPSF can be used in many domains such as image processing and seismic data processing.
The rest of the paper is organized as follows.In Section 2, we review some related works.In Section 3, we build a data model, define the first-arrival picking problem and introduce some concepts.In Section 4, we elaborate on the principle of the new method of this paper.In the sequel, two field data sets are experimented to verify the effectiveness of the method in Section 5. Finally, Section 6 summarizes this paper.

Related Works
The history of seismic refraction data analysis can be traced back to the 1920s [6].Seismic refraction data analysis tasks include deconvolution [24], dynamic correction [25], static correction [26,27], speed analysis [28], and migration [29].Picking first arrivals [19] is an important pre-processing stage for these tasks.For example, the effectiveness of static corrections depends on the precise of the first arrivals [6,15].
There are three strategies in the development of picking first arrivals.The manual strategy relies solely on the experts, therefore it is time-consuming and occasionally inaccurate [15,19,23,30].To make the matter worse, this strategy can lead to biased and inconsistent picks because it relies on the subjectivity of the selection operator [15].
The man-machine interaction strategy provides experts with software for visual inspection [19,23,30].The expert should identify a few first arrivals, and then the software will pick the others.In case of some difficult situations, the expert interfere with the process.Naturally, this strategy is more efficient.However, the whole procedure is still very time consuming and subjective.
The automatic strategy [15,16,23] aims to provide more efficient and intelligent solution.It requires the development of advanced machine learning and data mining algorithms.Note that this strategy does not prevent experts from intervening.Experts should check the result and correct it if necessary.Naturally, if the algorithm works well, manual intervention is rare.
Currently, there are many well-known seismic data processing systems, such as Promax [31], CGG (refering to wikipedia), Focus and Grisys [32].They all contain the key step of picking first arrivals.Affected by data quality and parameter setting, the results of each software program are very different.Therefore, an accurate, efficient and stable algorithm for this problem is needed.

Preliminaries
In this section, we build a data model, define the first-arrival picking problem and introduce some basic concepts.Table 1 lists notations used throughout the paper.

Data Model
Single shot gather is a basic concept in the field of geophysical prospecting.Definition 1.A single shot gather is an m × n matrix S = [s i,j ], where n is the number of traces, m is the number of samples for each trace, and s i,j is the energy value of the ith sample of the jth trace.

First-Arrival Picking
We consider the following problem.
Problem 1.The first-arrival picking problem.Input: A single shot gather S; .m] is the first-arrival time of the j-th trace.

S
The single shot gather F The first-arrival times l The vertical window size b The horizontal window size λ The starting index of the current window a The energy ratio weight k The search step size Λ The starting index array of result windows n The number of traces m The number of samples for each trace R The first-arrival range matrix θ The parameters of fitness function π The parameters of fitness function B The boundary matrix B 1 The position boundary matrix B 2 The velocity range matrix g The dimension of the input problem f The fitness function M The number of particles δ 1 The inertia weight of each particle's velocity δ 2 The global influence weight w The inertia weight T The maximum iteration times ε The convergent error x * The solution of the best particle U The first-arrival range matrix e The number of clustering centers γ The fuzzy indicator J FCM The objective function c k The center of the k-th cluster d k (i, j) The distance between s i,j and c k That is, for any trace j, there is exactly one sample f (j) corresponding to the first-arrival travel time.
Figure 2b shows the output of first-arrival picking with the input shown in Figure 2a.Every trace has one first-arrival travel time.The red point is the first-arrival location of every trace.

Fuzzy c-Means
FCM algorithm was proposed by Dunn [33].It was promoted as the general FCM clustering algorithm by Bezdek [34].FCM has been used in many fields such as image segmentation [35][36][37].
The fuzzy set was conceived as a result of an attempt to come to grips with the problem of imprecisely defined categories [34].K-means determines whether or not a group of objects form a cluster.Different from k-means, FCM determines the belonging of an object to a class with a matter of degree [34,38].When objects X consists of k compact well separated clusters, k-means generates a limiting partition with membership functions which closely approximate the characteristic functions of the clusters.However, when X is not the union of k compact well separated clusters, the limiting partition is truly fuzzy in the sense that the values of its component membership functions differ substantially from 0 or 1 over certain regions of X.The fuzzy algorithm seems significantly less prone to the "cluster-splitting" tendency and may also be less easily diverted to uninteresting locally optimal partitions Dunn [33].
Let O = {o 1 , o 2 , . . ., o N } be a set of objects.The standard fuzzy c-means objective function for partitioning O into e clusters is given by [37,39]: where u ik ∈ [0, 1] is the degree of o k belonging to the i-th cluster, α is the fuzzy indicator, d ik = o k − c i is the distance between o k and c i , and c i is the center of the i-th cluster.
We also require the sum of the degrees of each object be 1, i.e., ∑ e i=1 u ik = 1 ∀k ∈ [1.
.N].In many applications, the Euclidean distance is employed to compute d ik , and β = 2.
For an optimisation problem with M particles.Let x i (t) denote the best solution that particle i has obtained until iteration t .Let x * (t) denote the best solution obtained among all the particles so far.To search for the optimal solution, each particle updates its velocity v i (t) and position x i (t) according to Equation ( 2) and (3) [46] : where t is the current iteration, rand 1 and rand 2 are the random variables in the range of [0, 1], δ 1 and δ 2 are the two positive acceleration constants to adjust relative velocity with respect to best global and local positions and the inertia weight w is used to balance the capabilities of the global exploration.

The Proposed Algorithm
The algorithm framework has been illustrated in Figure 1.The range detection stage is composed of vertical window sliding and horizontal window sliding.The first-arrival picking stage is composed of PSO and FCM.This section describes each stage in detail.

Range Detection
This subsection explains the range detection stage with a vertical sliding window and a horizontal sliding window.
We apply a vertical sliding window to capture the energy which is large, early and shift abrupt in each trace.Let the window size be l and the starting index of the current window of the j-th trace be λ j .We design the following optimization objective function: where a is the energy ratio weight.Here, the first part expresses the ratio between the upper and lower part of the window.The smaller the value, the larger the shift.The second part expresses the evaluation of the position of the window.The smaller the value, the earlier the travel time.The weight a is used to obtain a trade-off between these two values.Algorithm 1 lists the vertical window sliding process.Line 1 initializes the minimal value of the object function value r * .Lines 2 to 10 show the process of sliding a vertical window with a step size of k.Line 3 calculates the sum of the upper part of the window us.Line 4 calculates the sum of the lower part of the window ls.Line 5 computes object function value of the current window according to Equation (4).Lines 6 to 9 determine if the update condition has been reached.If so, update the minimal value of the object function value r * and the starting index of the vertical window λ j in Lines 7 and 8.

Algorithm 1: Vertical Window Sliding for One Trace
Input: The j-th trace s •,j = [s 1,j , s 2,j , . . ., s m,j ], window size l, ratio weight a and search step size k.Output: The starting index of the result window λ j .Method: verticalSliding.We apply a horizontal window to adjust the neighboring first-arrival intervals determined by the vertical windows.Median filtering is employed to smooth the first-arrival intervals in the window to ensure their similarity.Definition 2. First-arrival range matrix is an l × n matrix R = [s i,j ], where l is the size of vertical window and n is the number of traces.It saves the first-arrival range including first arrivals.This matrix stores the range of the first arrivals.The size of the original data set S has been reduced from m × n to l × n.
Algorithm 2 lists the horizontal window sliding process.Lines 1 to 9 show the process of sliding a horizontal window.Line 2 moves the horizontal window in step size b.Line 3 obtains the median of the window m.Lines 4 to 7 determine whether the difference between each element in the window and the median is too large.If so, update this value with a large difference to the median in Line 6.
After the above steps, range detection stage has been completed and the first-arrival range expressed by the first-arrival range matrix (R) has been confirmed.This is one kind of dimensionality reduction techniques [47] and data size is directly reduced by 90%.Just as pre-processing can help increase the accuracy [48], range detection stage can be viewed as a pre-processing stage.end for 9: end for 10: return Λ;

First-Arrival Picking from the Range
This subsection explains the first-arrival picking stage with PSO and FCM.The data field at this stage is the first-arrival range confirmed by range detection stage.
We employ PSO to find the original clustering centers of FCM according to the advantages of PSO including global optimization and fast convergence [49].Specifically, we use the following fitness function: where θ and π are the parameters of fitness function with the constraint θ ≤ π .The J FCM is the objective function of the FCM clustering method we employed.The parameter θ was usually proposed as 2 [39].The particle swarm velocity iterative update formula is Equation (2).The particle swarm position iterative update formula is Equation (3).Definition 3. Boundaries are represented by a g × 2 matrix B = [b i,j ], where b i,1 is the lower bound, and b i,2 is the respective upper bound.
Here, we have two boundaries, the position boundary and the velocity boundary.Let B 1 be the position boundary, and let B 2 be the velocity boundary.
Algorithm 3 lists the process of particle swarm optimization.Lines 1 to 4 initialize each of the particle's position x i and velocity v i with random values.Line 5 initializes iteration times t.Lines 6 to 11 calculate the fitness function and record best solution of each particle according to Equation (5).Line 9 records best solution of each particle itself.Line 12 finds the optimal particle.Line 14 updates the global optimal particle.Lines 16 to 21 update velocity and position of each particle.Line 17 updates the velocity of each particle according to Equation (2).Lines 18 and 20 determine whether the velocity and position are out of boundaries.Line 19 updates the position of each particle according to Equation (3).
We employ FCM to pick first arrivals according to the similarity of the first-arrival energy values of adjacent traces.The fuzzy c-means algorithm iteratively calculates on the seismic data set to obtain the clustering center that minimize the objective function.the number of particles M, the inertia weight of each particle's velocity δ 1 , the global influence weight δ 2 , the inertia weight w, the maximum iteration times T and the convergent error ε.Output: Solution of the best particle x * .Method: particleSwarmOptimization.

20:
x i = check_and_adjust(x i , B 1 ); // Check and adjust 21: end for 22: t = t + 1; 23: end while 24: return x * ; Definition 4. First-arrival range matrix is an l × n × e matrix U = [u i,j,k ], where l is the height of the first-arrival range, n is the number of traces, e is the number of clustering centers, and u i,j,k is the membership degree of s i,j belonging to the k-th cluster.
We use the following objective function: where γ is the fuzzy indicator, d k (i, j) = s i,j − c k is the distance between s i,j and c k , and c k is the center of the k-th cluster.The membership degree is updated according to The clustering center is updated according to Algorithm 4 lists the process of fuzzy c-means.Line 1 initializes membership matrix U. Line 2 computes clustering objective function value J according to Equation (6).Lines 3 to 6 iteratively update membership matrix U and clustering center array x * .Line 4 updates membership matrix U according to Equation (7).Line 5 updates clustering center array x * according to Equation (8).x * = update_centers_X(U, R, γ); // Update x * according to Equation (8) 6: end while// Check the convergence 7: return U, x * ; After the FCM processing, e clustering centers can be fixed.The data is divided into 10 classes and one of the classes is the result of first-arrival picking.After the above steps, the first-arrival picking stage has been completed and the first-arrival travel times have been confirmed.

Experimental Results
This section shows the experimental results with two data sets.Figure 3a shows the field microseismic data consists of 280 shots from Xinjiang, China.Every shot has about 400 traces and time sampling interval is 2 ms. Figure 3b shows the result of range detection.Figure 3c shows the result of FPSF. Figure 4a shows the field microseismic data consists of 150 shots from Sichuan, China.Every shot has about 500 traces and time sampling interval is 2 ms. Figure 4b shows the result of range detection.Figure 4c shows the result of FPSF.  Figure 5a shows Xinjiang field microseismic data.Figure 5b shows the comparison of the difference among the values by the MCM method (purple spots), BNN method (blue spots), DC method (green spots) and FPSF (red spots).Table 2 shows the accuracy of each method for different data sets.We can find out that FPSF is more accurate than BNN, DC on the two data sets and MCM on one data set.In general, FPSF shows superiority over BNN, DC and MCM on the two data sets.

Conclusions
At the current stage of developments of artificial intelligence, we urgently need new theories applied to seismic exploration rather than sticking to the theory of seismic exploration itself.In this paper, we propose the first-arrival picking through sliding windows and fuzzy c-means (FPSF) algorithm.The biggest difference from the conventional methods that are currently available is that our method does not directly pick up the first-arrival travel times but determines a first-arrival range before picking.Combined with the characteristics of seismic data, we have improved some methods, such as particle swarm optimization and fuzzy c-means.Experiments with Xinjiang's data set with 280 shots and Sichun's data set with 150 shots show that the proposed method can significantly improve accuracy and stability.In the future, we will apply the idea of FPSF to other domains such as image processing.

Figure 1 .
Figure 1.The framework of the FPSF algorithm (In the middle part, red indicates first-arrival intervals or initial clustering centers; In the right-top part, red indicates the first-arrival range; In the right-bottom part, red indicates first arrivals).

Figure
Figure 2a illustrates an original single shot gather with 1000 samples and 800 traces.The horizontal coordinate is the number of traces.The vertical coordinate is the number of samples.The amplitude values of samples are the energy values.Black and white points correspond to positive and negative amplitudes, respectively.To cope with different geophones and different traces, we pre-process our data normalized to the range [−1, 1].

Figure 2 .
Figure 2. Field data and first-arrival travel times.

Algorithm 3 :
PSOInput:The fitness function f , the matrices of position boundary B 1 and velocity boundary B 2 ,

Figure 3 .
Figure 3. First arrivals picked by FPSF for field microseismic record.(a) field microseismic record; (b) the result of range detection; (c) the result of FPSF.
(a) microseismic data (b) result of range detection (c) result of FPSF

Figure 4 .
Figure 4. First arrivals picked by FPSF for field microseismic record.(a) field microseismic record; (b) the result of range detection; (c) the result of FPSF.
(a) microseismic data (b) the results of different methods