Fault Detection Based on Fully Convolutional Networks (FCN)

: It is of great signiﬁcance to detect faults correctly in continental sandstone reservoirs in the east of China to understand the distribution of remaining structural reservoirs for more efﬁcient development operation. However, the majority of the faults is characterized by small displacements and unclear components, which makes it hard to recognize them in seismic data via traditional methods. We consider fault detection as an end-to-end binary image-segmentation problem of labeling a 3D seismic image with ones as faults and zeros elsewhere. Thus, we developed a fully convolutional network (FCN) based method to fault segmentation and used the synthetic seismic data to generate an accurate and sufﬁcient training data set. The architecture of FCN is a modiﬁed version of the VGGNet (A convolutional neural network was named by Visual Geometry Group). Transforming fully connected layers into convolution layers enables a classiﬁcation net to create a heatmap. Adding the deconvolution layers produces an efﬁcient network for end-to-end dense learning. Herein, we took advantage of the fact that a fault binary image is highly biased with mostly zeros but only very limited ones on the faults. A balanced crossentropy loss function was deﬁned to adjust the imbalance for optimizing the parameters of our FCN model. Ultimately, the FCN model was applied on real ﬁeld data to propose that our FCN model can outperform conventional methods in fault predictions from seismic images in a more accurate and efﬁcient manner.


Introduction
In both unconventional and conventional reservoirs in eastern China, faults play a major role in lateral sealing of thin reservoirs and controlling the accumulation of the remaining oil [1][2][3]. Almost all of the developed oil and gas fields in eastern China are distributed in rift basins which are rich in oil and gas resources with highly developed and very complex fault systems [4][5][6]. Based on current theories and techniques, significant difficulties still exist in the accurate identification and characterization of faults. This is because a variety of faults can be developed in rift basins, such as normal faults, normal oblique-slip faults, oblique faults, and strike-slip faults, with different combinations of planes and sections, most of which are broom shaped, comb shaped, goose row shaped, and parallel interlaced in planes. In this regard, most of them are Y-shaped and negative flower shaped in sections [7,8]. In a rift basin, the filling of sediments, the development and distribution of sedimentary sequences, the formation, distribution, and evolution of oil and gas pools (including the formation and effectiveness of traps, hydrocarbon migration, and accumulation) are closely related to the distribution and activities of faults [9,10]. Therefore, fine detection and characterization of faults in rift basins in eastern China has become a key basic geological problem for oil and gas exploration and development efforts and has become the key topic of basin tectonic research.

Illustration of FCN
CNN has achieved great success in the field of image classification, and several network models such as VGGNet and AlexNet [36] have emerged as a result. Due to its multi-layer structure, it can learn features automatically, with multiple levels of features. For example, the shallow convolution layer has a small perception domain, and it can learn some local features, whereas the deeper convolution layer has a larger perception domain and it can learn more abstract features. Because these abstract features are less sensitive to the size, position, and direction of the object, CNN cannot precisely recognize the outlines of an object and its corresponding pixels, though they are highly capable of improving the classification performance and can distinguish the types of objects in an image. Far from image classification, FCN is proposed for image segmentation, which has become the basic framework of semantic segmentation [2,35,[37][38][39]. For conventional CNN, as shown in Figure 1, some fully connected layers will be added at the end of the network to obtain 1D category probability information. This probability information can only identify the category of the whole image, not the category of each pixel. In FCN, transforming fully connected layers into convolution layers enables a classification net to output a heatmap. As shown in Figure 2, FCN is an end-to-end, pixel-to-pixel network.
J. Mar. Sci. Eng. 2021, 9, x FOR PEER REVIEW 3 of 13 to the size, position, and direction of the object, CNN cannot precisely recognize the outlines of an object and its corresponding pixels, though they are highly capable of improving the classification performance and can distinguish the types of objects in an image. Far from image classification, FCN is proposed for image segmentation, which has become the basic framework of semantic segmentation [2,35,[37][38][39]. For conventional CNN, as shown in Figure 1, some fully connected layers will be added at the end of the network to obtain 1D category probability information. This probability information can only identify the category of the whole image, not the category of each pixel. In FCN, transforming fully connected layers into convolution layers enables a classification net to output a heatmap. As shown in Figure 2, FCN is an end-to-end, pixel-to-pixel network.  After each pooling, the size of the feature map will become smaller accordingly. Transforming fully connected layers into convolution layers enables a classification net to output a H × W heatmap. Adding the deconvolution layers produces an efficient network for end-to-end dense learning.
The typical architecture of the FCN is shown in Figure 3. The core component of FCN is the convolutional layer, which is mainly responsible for feature learning. It contains several feature maps processed by the convolution kernels. Each convolution kernel processes data solely for its receptive field with the same shared weights, thus reducing the number of free parameters and allowing FCN to be deeper with fewer parameters. The formula for calculating the convolutional layers is as follows: x + + is the input; b is the bias; and R is the activation function, which brings a nonlinear factor that allows FCN to approximate any nonlinear function. Figure 1. The size of the image input in convolutional neural network (CNN) is fixed N × N. After each pooling, the size of the feature map will become smaller accordingly. There are three 1 × 1 fully connected (FC) layers following a stack of convolutional layers. The prediction result is a 1D category probability information. to the size, position, and direction of the object, CNN cannot precisely recognize the outlines of an object and its corresponding pixels, though they are highly capable of improving the classification performance and can distinguish the types of objects in an image. Far from image classification, FCN is proposed for image segmentation, which has become the basic framework of semantic segmentation [2,35,[37][38][39]. For conventional CNN, as shown in Figure 1, some fully connected layers will be added at the end of the network to obtain 1D category probability information. This probability information can only identify the category of the whole image, not the category of each pixel. In FCN, transforming fully connected layers into convolution layers enables a classification net to output a heatmap. As shown in Figure 2, FCN is an end-to-end, pixel-to-pixel network.  After each pooling, the size of the feature map will become smaller accordingly. Transforming fully connected layers into convolution layers enables a classification net to output a H × W heatmap. Adding the deconvolution layers produces an efficient network for end-to-end dense learning.
The typical architecture of the FCN is shown in Figure 3. The core component of FCN is the convolutional layer, which is mainly responsible for feature learning. It contains several feature maps processed by the convolution kernels. Each convolution kernel processes data solely for its receptive field with the same shared weights, thus reducing the number of free parameters and allowing FCN to be deeper with fewer parameters. The formula for calculating the convolutional layers is as follows:  x + + is the input; b is the bias; and R is the activation function, which brings a nonlinear factor that allows FCN to approximate any nonlinear function. The typical architecture of the FCN is shown in Figure 3. The core component of FCN is the convolutional layer, which is mainly responsible for feature learning. It contains several feature maps processed by the convolution kernels. Each convolution kernel processes data solely for its receptive field with the same shared weights, thus reducing the number of free parameters and allowing FCN to be deeper with fewer parameters. The formula for calculating the convolutional layers is as follows: where conv(i, j) is the convolution result, also known as the feature map; M indicates the size of the convolution kernel (M × M); w u,v is the weight of the convolution kernel in line u and column v; x i+u,j+v is the input; b is the bias; and R is the activation function, which brings a nonlinear factor that allows FCN to approximate any nonlinear function. The rectified linear unit (ReLU) function [40] is used as the activation function in most neural networks. The ReLU function can be expressed as R(x) = max(0, x), which helps save computational cost, reduce the vanishing gradient problem, and alleviate overfitting.
The pooling layer follows the convolutional layers, which is used for nonlinear down-sampling. The pooling layer can reduce the number of dimensions and parameters by combining the outputs of the neuron clusters into a single neuron. The pooling layer can be performed in two separate approaches: average pooling and max pooling. Average pooling uses the average value from the feature maps at the prior layer, whereas the max pooling takes advantage of the maximum value. In modern networks, max pooling has often been used [41] and can be expressed as where i x is the value of each neuron in the region In CNN, the convolution layer is closely followed by the FC layers. The FC layers connect every neuron to all former layers, and the flattened matrix goes through the FC layer to get a dense prediction, which is used to classify the images. However, FCN replaces these FC layers with fully convolutional layers.
Through previous multiple convolution operations, we obtained the final feature map. On this basis, multiple up-sampling operations are carried out to make the output result consistent with the input size, thus obtaining the pixel-level prediction result. We often use deconvolution to up-sample. On the contrary, deconvolution is the inverse process of convolution, which is also called transposed convolution. In the convolution process, the pooling layers make the feature map smaller and smaller, and a lot of useful information will be lost. If we perform up-sampling directly with deconvolution, the prediction results will be very rough. Therefore, we build a novel skip architecture and crop useful feature information to refine prediction [35]. Through the skip architecture, the detailed features of the lower layers can be fused with those of the deep layers. Combining fine layers and coarse layers allows the model to make local predictions that at the same time will respect the global structure.

Architecture of Our FCN
The proposed FCN is established by modifying the VGGNet, which is one of the CNN networks. The training time required for the VGGNet is significantly less than required for AlexNet [40]. A variety of architectures of the VGGNet exist because of different number of layers. Figure 4 shows the commonly used architecture VGG16.
We take the VGG16 as the foundation for our network. Figure 5 displays the architecture of our FCN. First, we changed the input dimensions to 128 × 128 × 128, which is the size of the 3D seismic image. Moreover, we replace FC layers with fully convolutional layers and add deconvolution layers behind it. In the convolution part, each step contains some 3 × 3 × 3 convolutional layers followed by a ReLU activation and a 2 × 2 × 2 max The rectified linear unit (ReLU) function [40] is used as the activation function in most neural networks. The ReLU function can be expressed as R(x) = max(0, x), which helps save computational cost, reduce the vanishing gradient problem, and alleviate overfitting.
The pooling layer follows the convolutional layers, which is used for nonlinear downsampling. The pooling layer can reduce the number of dimensions and parameters by combining the outputs of the neuron clusters into a single neuron. The pooling layer can be performed in two separate approaches: average pooling and max pooling. Average pooling uses the average value from the feature maps at the prior layer, whereas the max pooling takes advantage of the maximum value. In modern networks, max pooling has often been used [41] and can be expressed as where x i is the value of each neuron in the region D u,v and R u,v is the value after max pooling.
In CNN, the convolution layer is closely followed by the FC layers. The FC layers connect every neuron to all former layers, and the flattened matrix goes through the FC layer to get a dense prediction, which is used to classify the images. However, FCN replaces these FC layers with fully convolutional layers.
Through previous multiple convolution operations, we obtained the final feature map. On this basis, multiple up-sampling operations are carried out to make the output result consistent with the input size, thus obtaining the pixel-level prediction result. We often use deconvolution to up-sample. On the contrary, deconvolution is the inverse process of convolution, which is also called transposed convolution. In the convolution process, the pooling layers make the feature map smaller and smaller, and a lot of useful information will be lost. If we perform up-sampling directly with deconvolution, the prediction results will be very rough. Therefore, we build a novel skip architecture and crop useful feature information to refine prediction [35]. Through the skip architecture, the detailed features of the lower layers can be fused with those of the deep layers. Combining fine layers and coarse layers allows the model to make local predictions that at the same time will respect the global structure.

Architecture of Our FCN
The proposed FCN is established by modifying the VGGNet, which is one of the CNN networks. The training time required for the VGGNet is significantly less than required for AlexNet [40]. A variety of architectures of the VGGNet exist because of different number of layers. Figure 4 shows the commonly used architecture VGG16.
We take the VGG16 as the foundation for our network. Figure 5 displays the architecture of our FCN. First, we changed the input dimensions to 128 × 128 × 128, which is the size of the 3D seismic image. Moreover, we replace FC layers with fully convolutional layers and add deconvolution layers behind it. In the convolution part, each step contains some 3 × 3 × 3 convolutional layers followed by a ReLU activation and a 2 × 2 × 2 max pooling operation with stride 2 for down-sampling. In the deconvolution part, every step contains three 3 × 3 × 3 deconvolutional layers and a 2 × 2 × 2 max unpooling layer. pooling operation with stride 2 for down-sampling. In the deconvolution part, every step contains three 3 × 3 × 3 deconvolutional layers and a 2 × 2 × 2 max unpooling layer.  The output of our FCN is the fault probability body, where 1 represents fault and 0 represents nonfault. Because the initial value of the weight in the convolution layer is random, there will be a deviation between the prediction and the actual in the early stage of neural network training; therefore, it is necessary to use the random gradient descent algorithm to continuously update the value of the network parameters and reduce the value of the loss function. This should be done until the prediction and the actual response gradually converge. Because most of prediction results are nonfault, almost 90% of them are 0 values. We used the following balanced crossentropy loss function as discussed in [32] to achieve this goal: represents the ratio between nonfault pixels and the total image pixels, whereas 1 β − denotes the ratio of fault pixels in the 3D seismic image. i p represents the probability of a fault, and i y is the label value.

Synthesizing Seismic Data Sets
It is important to synthesize seismic data sets before training the neural network, which can provide sufficient training and validation data sets for our network. The synthetic seismic data sets are from open-source data sets [28], which are all automatically   The output of our FCN is the fault probability body, where 1 represents fault and 0 represents nonfault. Because the initial value of the weight in the convolution layer is random, there will be a deviation between the prediction and the actual in the early stage of neural network training; therefore, it is necessary to use the random gradient descent algorithm to continuously update the value of the network parameters and reduce the value of the loss function. This should be done until the prediction and the actual response gradually converge. Because most of prediction results are nonfault, almost 90% of them are 0 values. We used the following balanced crossentropy loss function as discussed in [32] to achieve this goal: where 0 (1 )

Synthesizing Seismic Data Sets
It is important to synthesize seismic data sets before training the neural network, which can provide sufficient training and validation data sets for our network. The synthetic seismic data sets are from open-source data sets [28], which are all automatically The output of our FCN is the fault probability body, where 1 represents fault and 0 represents nonfault. Because the initial value of the weight in the convolution layer is random, there will be a deviation between the prediction and the actual in the early stage of neural network training; therefore, it is necessary to use the random gradient descent algorithm to continuously update the value of the network parameters and reduce the value of the loss function. This should be done until the prediction and the actual response gradually converge. Because most of prediction results are nonfault, almost 90% of them are 0 values. We used the following balanced crossentropy loss function as discussed in [32] to achieve this goal: where represents the ratio between nonfault pixels and the total image pixels, whereas 1 − β denotes the ratio of fault pixels in the 3D seismic image. p i represents the probability of a fault, and y i is the label value.

Synthesizing Seismic Data Sets
It is important to synthesize seismic data sets before training the neural network, which can provide sufficient training and validation data sets for our network. The synthetic seismic data sets are from open-source data sets [28], which are all automatically generated by randomly adding folding, faulting, and noise in the volumes. The simplified workflow to synthesize seismic data sets is performed as follows:

1)
The horizontal reflectivity model is designed as h(x, y, z) with a sequence of random values that are in the range of [−1,1]. 2) Use Equation (4) to generate a fold structure.
which combines with multiple 2D Gaussian functions and a linear-scale function 2.1z/z max . The combination of 2D Gaussian functions creates laterally varying folding structures, whereas the linear-scale function dampens the folding vertically from bottom to top. In this equation, each combination of the parameters m 0 , n k , l k , p k , σ k generates some specific spatially varying folding str uctures in the model. By randomly choosing each of the parameters from predefined ranges, we are able to create numerous models with unique structures. 3) Substituting f 1 (x, y, z) into h(x, y, z) leads to h(x, y, z + f 1 (x, y, z)). 4) Planar shearing of h(x, y, z + f 1 (x, y, z)) through f 2 (x, y, z) = t 0 + ix + jy leads to h(x, y, z + f 1 + f 2 ). In the model h(x, y, z + f 1 + f 2 ), the parameters t 0 , i, j are randomly chosen from some predefined ranges. 5) Use Equation (5) to add planar faulting in the model h(x, y, z + f 1 + f 2 ) and create a reflectivity model containing folds and faults. where where u f is the vector representing the dip angle of the fault, v f is the vector representing the strike of the fault, and w f is the vector representing the normal direction perpendicular to the strike of the fault. σ u f , σ v f , and σ w f respectively represent the distribution range of the fault in the direction of u f , v f , and w f . 6) Convoluting the reflectivity model with a Ricker wavelet to obtain a 3D seismic image.
In order to construct a more realistic synthetic seismic image, some random noise is added. From this noisy image, we crop a final training seismic data set (Figure 6a) with the size of 128 × 128 × 128 to avoid the artifacts near the boundaries. Figure 6b illustrates the corresponding binary fault labeling data set, and Figure 7 depicts the faults on the synthetic training data set. Randomly selected vertical sections and time slices are inline 65, crossline 50, and time slice at 80 ms, respectively.
To generate sufficient training data to optimally train the neural network for fault segmentation, we randomly chose parameters of faulting, folding, wavelet peak frequency, and noise to obtain 300 pairs of 3D unique seismic images and corresponding fault labeling images by using this workflow. Using the same workflow, we also automatically generated To generate sufficient training data to optimally train the neural network for fault segmentation, we randomly chose parameters of faulting, folding, wavelet peak frequency, and noise to obtain 300 pairs of 3D unique seismic images and corresponding fault labeling images by using this workflow. Using the same workflow, we also automatically generated 30 pairs of seismic and fault labeling images for the validation. To increase the diversity of the data sets and to prevent our FCN model from learning irrelevant patterns, we applied simple data augmentations including vertical flip and rotation around the vertical time or depth axis. When rotating the seismic and fault labeling volumes, we have six options of 45°, 90°, 135°, 180°, 225°, and 270°.

Training and Validation
We trained our FCN model by using 300 pairs of synthetic 3D seismic and fault images that are automatically created shown in Figures 6 and 7. The validation data set contains another 30 pairs of such synthetic seismic and fault images, which are not used in To generate sufficient training data to optimally train the neural network for fault segmentation, we randomly chose parameters of faulting, folding, wavelet peak frequency, and noise to obtain 300 pairs of 3D unique seismic images and corresponding fault labeling images by using this workflow. Using the same workflow, we also automatically generated 30 pairs of seismic and fault labeling images for the validation. To increase the diversity of the data sets and to prevent our FCN model from learning irrelevant patterns, we applied simple data augmentations including vertical flip and rotation around the vertical time or depth axis. When rotating the seismic and fault labeling volumes, we have six options of 45°, 90°, 135°, 180°, 225°, and 270°.

Training and Validation
We trained our FCN model by using 300 pairs of synthetic 3D seismic and fault images that are automatically created shown in Figures 6 and 7. The validation data set contains another 30 pairs of such synthetic seismic and fault images, which are not used in

Training and Validation
We trained our FCN model by using 300 pairs of synthetic 3D seismic and fault images that are automatically created shown in Figures 6 and 7. The validation data set contains another 30 pairs of such synthetic seismic and fault images, which are not used in the training data set. Prior to training, each image is subtracted by its mean value and divided by its standard deviation. This normalization is necessary because the amplitude values of different real seismic images can differ from one another. The training data sets are used to train a given model and optimize the parameters, whereas the validation data sets are used to evaluate a given model during the training process and prevent overfitting of the model. We fed the 3D seismic images to the FCN model in batches. Each batch contains seven images, which consist of an original image and its rotation around the vertical time/depth axis by 45 • , 90 • , 135 • , 180 • , 225 • , and 270 • . If adequate GPU memory is available, a larger batch size can be tried. We train the network with 30 epochs, and all the 300 training images are processed at each epoch.
To make up-sampling more detailed, we divide the training process into three stages where the deconvolution stride gets smaller at each stage. In the first stage, as shown in Figure 8, the deconvolution stride is 32. In the second stage, we carry out the training with stride 16, as presented in Figure 9. During this process, there are two deconvolution operations. Before the second deconvolution, we crop the prediction results of the third pooling layer. Next, deconvolution is applied to obtain the predicted results of 128 × 128 × 128 by using the skip architecture to sum the first deconvolution result and the cropped result. In the last stage, we perform the training with stride 4 with three deconvolution operations. Before the third deconvolution, the prediction results of the first pooling layer are cropped. Then, deconvolution with stride 4 is performed by employing the skip architecture to sum the second deconvolution result and the cropped result as shown in Figure 10. Considering Figure 11, the training and validation accuracies gradually increase to 95%, whereas the training and validation loss converges to 0.01 after 30 epochs. divided by its standard deviation. This normalization is necessary because the amplitude values of different real seismic images can differ from one another. The training data sets are used to train a given model and optimize the parameters, whereas the validation data sets are used to evaluate a given model during the training process and prevent overfitting of the model. We fed the 3D seismic images to the FCN model in batches. Each batch contains seven images, which consist of an original image and its rotation around the vertical time/depth axis by 45°, 90°, 135°, 180°, 225°, and 270°. If adequate GPU memory is available, a larger batch size can be tried. We train the network with 30 epochs, and all the 300 training images are processed at each epoch.
To make up-sampling more detailed, we divide the training process into three stages where the deconvolution stride gets smaller at each stage. In the first stage, as shown in Figure 8, the deconvolution stride is 32. In the second stage, we carry out the training with stride 16, as presented in Figure 9. During this process, there are two deconvolution operations. Before the second deconvolution, we crop the prediction results of the third pooling layer. Next, deconvolution is applied to obtain the predicted results of 128 × 128 × 128 by using the skip architecture to sum the first deconvolution result and the cropped result. In the last stage, we perform the training with stride 4 with three deconvolution operations. Before the third deconvolution, the prediction results of the first pooling layer are cropped. Then, deconvolution with stride 4 is performed by employing the skip architecture to sum the second deconvolution result and the cropped result as shown in Figure  10. Considering Figure 11, the training and validation accuracies gradually increase to 95%, whereas the training and validation loss converges to 0.01 after 30 epochs.   the training data set. Prior to training, each image is subtracted by its mean value and divided by its standard deviation. This normalization is necessary because the amplitude values of different real seismic images can differ from one another. The training data sets are used to train a given model and optimize the parameters, whereas the validation data sets are used to evaluate a given model during the training process and prevent overfitting of the model. We fed the 3D seismic images to the FCN model in batches. Each batch contains seven images, which consist of an original image and its rotation around the vertical time/depth axis by 45°, 90°, 135°, 180°, 225°, and 270°. If adequate GPU memory is available, a larger batch size can be tried. We train the network with 30 epochs, and all the 300 training images are processed at each epoch.
To make up-sampling more detailed, we divide the training process into three stages where the deconvolution stride gets smaller at each stage. In the first stage, as shown in Figure 8, the deconvolution stride is 32. In the second stage, we carry out the training with stride 16, as presented in Figure 9. During this process, there are two deconvolution operations. Before the second deconvolution, we crop the prediction results of the third pooling layer. Next, deconvolution is applied to obtain the predicted results of 128 × 128 × 128 by using the skip architecture to sum the first deconvolution result and the cropped result. In the last stage, we perform the training with stride 4 with three deconvolution operations. Before the third deconvolution, the prediction results of the first pooling layer are cropped. Then, deconvolution with stride 4 is performed by employing the skip architecture to sum the second deconvolution result and the cropped result as shown in Figure  10. Considering Figure 11, the training and validation accuracies gradually increase to 95%, whereas the training and validation loss converges to 0.01 after 30 epochs.

Application
The trained FCN model is applied to automatic fault interpretation of a real field seismic data. The study area is located in an oil field in eastern China, where complicated faults are widely present in the target formation [42][43][44]. Above 1700 ms, faults appear, and most of them are Y-shaped in profiles. The fault features are more complex below 1700 ms, however, due to the extensive existence of igneous rocks in the Dongying Formation, the quality of seismic data is seriously deteriorated, and the accuracy of fault picking becomes poor and challenging. In the plane, the fault is affected by tensile and strikeslip stress regimes, and the fault strike is mainly NE and NW. This data set consists of 500   (a) (b) Figure 11. (a) The training and validation accuracy both will increase with epochs, whereas (b) the training and validation loss decreases with epochs.

Application
The trained FCN model is applied to automatic fault interpretation of a real field seismic data. The study area is located in an oil field in eastern China, where complicated faults are widely present in the target formation [42][43][44]. Above 1700 ms, faults appear, and most of them are Y-shaped in profiles. The fault features are more complex below 1700 ms, however, due to the extensive existence of igneous rocks in the Dongying Formation, the quality of seismic data is seriously deteriorated, and the accuracy of fault picking becomes poor and challenging. In the plane, the fault is affected by tensile and strikeslip stress regimes, and the fault strike is mainly NE and NW. This data set consists of 500

Application
The trained FCN model is applied to automatic fault interpretation of a real field seismic data. The study area is located in an oil field in eastern China, where complicated faults are widely present in the target formation [42][43][44]. Above 1700 ms, faults appear, and most of them are Y-shaped in profiles. The fault features are more complex below 1700 ms, however, due to the extensive existence of igneous rocks in the Dongying Formation, the quality of seismic data is seriously deteriorated, and the accuracy of fault picking becomes poor and challenging. In the plane, the fault is affected by tensile and strikeslip stress regimes, and the fault strike is mainly NE and NW. This data set consists of    (Figure 12b). In addition, the fault likelihood has picked an abundance of horizontal fault features (Figure 12c), which are geologically unrealistic. Figure 13b,c illustrates the fault detection results at different slices. We observed that most faults are clearly detected under the trained FCN model, and multiple sets of faults striking in different directions are distinguished on the horizontal slice. Figure 13c is the fault likelihood at the same slice, which were able to detect most of the faults, but the features are much noisier than the FCN fault slice.
In summary, the field data example demonstrates that the proposed FCN-based method has superior performance in detecting faults and provides relatively higher sensitivity and continuity with less noise. In addition, fault prediction using the trained FCN model is highly efficient compared to seismic attributes to detect faults for the same volume, when common normal workstations are being used.   (Figure 12b). In addition, the fault likelihood has picked an abundance of horizontal fault features (Figure 12c), which are geologically unrealistic. Figure 13b,c illustrates the fault detection results at different slices. We observed that most faults are clearly detected under the trained FCN model, and multiple sets of faults striking in different directions are distinguished on the horizontal slice. Figure 13c is the fault likelihood at the same slice, which were able to detect most of the faults, but the features are much noisier than the FCN fault slice.
In summary, the field data example demonstrates that the proposed FCN-based method has superior performance in detecting faults and provides relatively higher sensitivity and continuity with less noise. In addition, fault prediction using the trained FCN model is highly efficient compared to seismic attributes to detect faults for the same volume, when common normal workstations are being used.   (Figure 12b). In addition, the fault likelihood has picked an abundance of horizontal fault features (Figure 12c), which are geologically unrealistic. Figure 13b,c illustrates the fault detection results at different slices. We observed that most faults are clearly detected under the trained FCN model, and multiple sets of faults striking in different directions are distinguished on the horizontal slice. Figure 13c is the fault likelihood at the same slice, which were able to detect most of the faults, but the features are much noisier than the FCN fault slice.
In summary, the field data example demonstrates that the proposed FCN-based method has superior performance in detecting faults and provides relatively higher sensitivity and continuity with less noise. In addition, fault prediction using the trained FCN model is highly efficient compared to seismic attributes to detect faults for the same volume, when common normal workstations are being used.

Conclusions
We developed a FCN-based method to automatically detect faults in the continental sandstone reservoirs in the east of China. The architecture of FCN is a modified version of the VGGNet. We trained our FCN model by using only 300 pairs of 3D synthetic seismic and fault volumes, which were all automatically generated. Because the distribution of fault and non-fault samples was heavily biased, a balanced loss function to optimize the FCN model parameters was defined. In the network training process, we employed a skip architecture and a crop operation several times to improve the accuracy of prediction results. The practical application results confirmed that FCN outperforms automatic and common fault detection methods (attributes) and is highly noise proof for providing a sharp image of the faults even in a complex structure.