1. Introduction
Since the discovery of shale oil in the Williston Basin in 1953, the United States has achieved remarkable progress in shale oil development through more than six decades of continuous technological innovation and exploration, drawing extensively upon the theoretical and engineering foundations established in shale gas research. Although shale oil exploration and development in China commenced later than in North America, rapid advancements have been made in recent years, and the country possesses widespread and abundant continental shale oil resources [
1,
2,
3]. Significant breakthroughs have already been realized in the Paleogene Kongdian and Shahejie formations of the Bohai Bay Basin, the Permian Lucaogou and Fengcheng formations of the Junggar Basin, the Cretaceous Qingshankou Formation of the Songliao Basin, and the Triassic Yanchang Formation of the Ordos Basin [
4,
5,
6].
With the accelerated industrial development of the Chang 7 shale oil interval in the Ordos Basin, the degree of natural fracture development has become a critical factor for sweet-spot selection and fracturability evaluation. Well-developed fractures facilitate the formation of complex fracture networks during hydraulic stimulation, thereby enabling large-scale commercial production [
3,
7]. Traditionally, natural fractures have been identified using approaches such as laboratory core analysis, well-log interpretation, and seismic interpretation. However, core analysis is limited to point-scale observations [
8]; seismic datasets typically provide insufficient resolution and limited spatial coverage [
9,
10]; and fracture identification from logging data is highly dependent on specialized logging tools and often requires extensive mathematical processing, high costs [
11], and time-consuming workflows [
12,
13].
To overcome the limitations of traditional methods, recent studies have explored fracture characterization using various algorithms, including back-propagation neural networks (BPNN), probabilistic neural networks (PNN), support vector machines (SVM), clustering analysis, and decision-tree methods [
14,
15,
16]. These studies have demonstrated that artificial intelligence (AI) techniques can effectively identify and predict fractures when large, high-quality datasets are available [
17,
18,
19,
20,
21,
22,
23]. In 2014, Mohammad Ali Ahmadi et al. [
17] employed a hybrid approach combining least squares support vector machines (LSSVM), artificial neural networks (ANN), fuzzy logic, Kalman filtering, and genetic algorithms (HFKGA) to predict water breakthrough time in weakly fractured reservoirs, utilizing extensive field data from northern Persian Gulf oil fields. In 2018, Qamar Yasin et al. [
18] introduced a fracture identification constant (FIC) by integrating conventional and specialized logging tools. They classified reservoir intervals into homogeneous subgroups based on mineralogy, lithology, facies, and pore-fluid types, and developed a theoretical model to convert fracture-related logging responses into positive indicators. Their results indicated that cumulative anomalies across all logging curves corresponded to fracture density variations. In 2021, Meysam Rajabi et al. [
19] proposed a fracture density prediction method using feature selection from twelve logging input variables. Their hybrid model combined a multiple extreme learning machine (MELM) network and multilayer perception (MLP) algorithm, optimized through genetic algorithms (GA) and particle swarm optimization (PSO). In 2022, Qamar Yasin et al. [
20] presented a novel framework for identifying natural fractures in carbonate reservoirs by integrating conventional logging with seismic reflection data. Using a hybrid deep neural network (DNN) and clustering-based model, they predicted spatial variations in lithology, porosity, and fracture parameters derived from seismic inversion. In the same year, Somayeh Tabasi et al. [
21] applied rock-physics logging and machine learning to predict fracture density in the Asmari fractured carbonate reservoir of Iran’s Marun oilfield. They utilized firefly and artificial bee colony optimizers to enhance distance-weighted k-nearest neighbor (DWKNN) and MLP network performance, confirming the accuracy of fracture detection and density prediction (FVDC). In 2023, Bo Liu et al. [
22] developed a fracture density prediction model leveraging seismic attributes, convolutional neural networks (CNN), and long short-term memory (LSTM) networks to forecast fracture spatial distributions. The predictions were validated using seismic fracture attributes, geological modeling, and formation micro-resistivity image (FMI) data. In the same year, Shanbin He et al. [
23] analyzed fractured cores and outcrop profiles, investigated key controlling factors of fracture development, and proposed an improved curve-rate method for fracture prediction based on conventional logging data, demonstrating effectiveness for Triassic Chang 6 tight sandstone reservoirs in the Ordos Basin, China.
Traditional machine learning methods (e.g., support vector machines, decision trees, and clustering analysis) can process structured logging data but rely on manual feature engineering and often fail to capture spatial fracture features in image logs. Deep learning approaches, particularly CNNs, enable automatic feature extraction from images but typically demand large datasets, incur high training costs, and are limited by single-modal input, which restricts the integration of multi-source logging data. Moreover, most existing models are designed either for fracture detection or for predicting a single fracture parameter, rather than simultaneously identifying fractures and predicting multiple fracture attributes from multimodal inputs.
In this study, the Chang 7 shale reservoir in the Ordos Basin is selected as a case study, and a novel multi-modal deep neural network approach is proposed for natural fracture identification and parameter prediction. By integrating conventional well-logging data and borehole imaging logs, a coupled CNN–DNN architecture is developed to simultaneously identify fracture occurrence and predict fracture dip and aperture. The innovation of this work lies in the application of a multi-modal neural network structure, wherein the CNN is used to extract spatial features from imaging logs, while the DNN captures complex non-linear relationships from conventional log data. This combined model greatly improves the comprehensiveness and accuracy of fracture characterization. Note that while FMI data are required as labeled data for model training, the trained model itself only requires conventional logs as input for fracture prediction, thereby reducing the dependence on costly FMI acquisition in ongoing development operations. Overall, the proposed approach provides an efficient and cost-effective solution for natural fracture evaluation based solely on conventional logging data and offers a practical tool for geoengineering integration in shale reservoirs. In future work, the model will be further optimized and generalized through the incorporation of additional regional datasets.
2. Geological Background
The study area is situated in the southwestern Ordos Basin, bounded by Huachi to the north, Ning County to the south, Tarwan to the east, and Qingyang to the west (
Figure 1). Structurally, it occupies the Qingyang nose-shaped structural belt in the middle-to-lower portion of the Yishan Slope, encompassing an area of approximately 2170 km
2. Based on sedimentary cycles, electrical properties, and hydrocarbon-bearing characteristics, the area is further subdivided into ten oil-bearing formations, numbered from Chang 10 to Chang 1 from bottom to top. Shale oil in the Ordos Basin is predominantly distributed within the Chang 7 section of the Triassic Yanchang Formation. Chang 7 mainly consists of mudstone and shale, interbedded with sandy turbidites within the deep-lake facies oil shale of eastern Longdong, which are oil-bearing [
3,
4,
5,
6]. These lithologies were deposited during the peak development of the Yanchang Formation Lake basin and constitute important source rocks, commonly referred to as “Zhangjiatan Shale”. They are widely distributed throughout the lake basin. In well logs, they exhibit the characteristic “three highs and one low” pattern: high resistivity, high natural gamma, high sonic travel time, and low spontaneous potential. Within Chang 7, the Chang 7
1 and Chang 7
2 intervals (interbedded type) comprise mudstone and shale interlayered with multiple thin fine-grained sandstone layers and currently represent the primary targets for exploration and development [
5]. Chang 7
3 (shale type), predominantly composed of mudstone and shale, serves as the main target for high-risk exploration and in situ conversion experiments [
4].
3. Methodology
3.1. Data Preparation
A total of 7480 sample points were collected from the Chang 7 shale oil reservoirs across eight vertical wells in the study area, all of which have fracture identification results and corresponding formation micro-resistivity image (FMI) logs. Seven logging parameters that most effectively represent formation characteristics were selected as input features: acoustic travel time (AC), caliper log (CAL), compensated neutron log (CNL), density log (DEN), natural gamma ray (GR), resistivity log (RT), and spontaneous potential (SP).
The logging curves were standardized using the StandardScaler function. Data were grouped by well and padded or truncated to 100 samples per well to ensure consistency. Corresponding FMIs were resized to 128 × 128 grayscale and augmented with a channel dimension; they were padded or truncated similarly to align with the logging data. Standardization was applied only to columns 3–9 (excluding well name and depth), followed by scaling to 0–1 to improve neural network training, mitigate instability from inconsistent units, and enhance model convergence.
Time-series samples were generated by slicing the seven logging parameters along the well-depth direction into fixed-length sequences (e.g., 10 consecutive data points per sample). For each sequence, the end-depth of the well segment was recorded as the label for supervised learning or auxiliary use. The two-dimensional tabular data were then transformed into three-dimensional arrays (samples × time steps × features) suitable for neural network input, simulating the logging process along well depth and effectively capturing the spatial characteristics of fracture development.
Specifically, this study utilized a dataset of 7480 samples from eight vertical wells, each containing fracture identification results and formation micro-resistivity imaging (FMI) logs. Seven representative logging parameters—acoustic travel-time (AC), caliper (CAL), compensated neutron (CNL), bulk density (DEN), gamma ray (GR), resistivity (RT), and spontaneous potential (SP)—were selected as input features. These parameters were standardized using Z-score normalization followed by rescaling to [0, 1] to mitigate scale discrepancies and enhance training stability. To construct inputs compatible with neural network architectures, the one-dimensional logging sequences along the well depth were segmented into fixed-length subsequences, forming a three-dimensional structure of samples × time steps × features. Corresponding FMIs were uniformly resized to 128 × 128 grayscale pixels and expanded with a channel dimension to align with the processed logging data along the depth axis.
3.2. Construction Method of the Multimodal Neural Network Model
The research framework for identifying natural fractures in shale reservoirs is presented in
Figure 2. Drawing on a review of various approaches from previous studies [
11,
14,
15,
25,
26,
27], the relevant data were selected to characterize the distribution patterns of natural fractures [
16]. To overcome existing challenges, a multimodal neural network model was developed and subsequently optimized for fracture identification [
28]. A well was randomly chosen, and its formation imaging log results were employed to validate the predictions of the deep neural network, thereby confirming the model’s applicability [
16,
29,
30,
31].
1. Fracture Identification Module (Dual-Input CNN)
This module performs binary classification of fractures and non-fractures by jointly processing two distinct data modalities:
Conventional Logging Data Input: Seven one-dimensional logging sequences (AC, CAL, CNL, DEN, GR, RT, SP) are structured as a three-dimensional tensor with dimensions (samples × time steps × features). Each depth point is treated as a sequential step within this framework. The tensor is processed through a TimeDistributed wrapper into two fully connected layers (32 and 64 neurons, respectively), enabling the extraction of high-level feature representations while preserving the sequential structure of the logging data along the depth axis.
FMI Data Input: The corresponding two-dimensional FMIs, resized to 128 × 128 grayscale format, are input to the CNN branch. This branch comprises three convolutional layers with 32, 64, and 128 filters, each followed by 2 × 2 max-pooling operations and ReLU activation functions to hierarchically extract spatial features associated with fracture morphology.
Feature Fusion and Classification: Feature vectors extracted from both the logging and imaging branches are concatenated along the feature dimension. The resulting fused feature vector is then processed through a series of fully connected layers and ultimately passed to a single neuron with a sigmoid activation function for binary prediction. To mitigate overfitting, a Dropout layer (rate = 0.5) is incorporated before the final layer. The model is optimized using binary cross-entropy loss and the Adam optimizer.
2. Fracture Parameter Prediction Module (DNN)
This module employs a deep neural network (DNN) architecture designed specifically to predict two key fracture parameters: dip angle and aperture.
Architecture: The DNN follows a fully connected architecture, accepting eight preprocessed logging features as input. These features are processed through three hidden layers containing 256, 128, and 64 neurons, respectively. Each hidden layer utilizes the ReLU activation function and is regularized with a Dropout layer (rate = 0.2) and L2 weight regularization (coefficient = 0.001) to enhance generalization performance.
Output and Training: The output layer consists of two linear neurons corresponding to the predicted fracture dip angle and aperture. Model training aims to minimize the mean squared error (MSE) using the Adam optimizer (learning rate = 0.001) with a batch size of 32. An early stopping strategy is implemented, whereby training ceases if the validation loss improvement remains below 1 × 10−5 for 20 consecutive epochs, up to a maximum of 200 epochs.
3. Integrated Multimodal Workflow
The complete framework operates sequentially: the Fracture Identification Module first classifies intervals as fractured or non-fractured. Subsequently, data from intervals identified as fractured are passed to the Fracture Parameter Prediction Module to estimate their dip angle and aperture. By integrating spatial features from imaging data with sequential patterns from logging data, this multimodal approach provides a cost-effective, efficient, and comprehensive solution for identifying natural fractures in shale reservoirs and quantifying their key geometric attributes. The model employed Time Distributed wrapper layers to preserve the temporal sequence structure (with each depth point treated as a time step) and included a Dropout layer (rate 0.5) to prevent overfitting. Binary cross-entropy loss and the Adam optimizer were used for mini-batch training to accommodate memory constraints. The model’s key innovations include multimodal feature fusion and temporal sequence processing, effectively integrating numerical logging and imaging data. Data padding was applied to handle wells of varying depths while preserving stratigraphic continuity.
3.3. Neural Network Algorithms
(1) Convolutional Neural Network
A convolutional neural network (CNN) typically comprises five layers. The input layer receives images or other data, convolutional layers extract low-level features, and pooling layers reduce dimensionality while mitigating overfitting. Fully connected layers integrate extracted features, and the output layer generates predictions, usually selecting the class with the highest probability [
11,
14,
15,
25,
26,
27].
Unlike conventional neural networks, CNN neurons are arranged in three dimensions: width, height, and depth (
Figure 3). For input layers, width and height denote the spatial dimensions, while depth represents the number of channels—three for RGB images and one for grayscale. In intermediate layers, width and height correspond to feature map dimensions determined by convolution and pooling parameters, and depth indicates the number of feature map channels, typically defined by the number of convolutional kernels.
(2) Deep Neural Networks
Deep Neural Networks (DNNs) were first proposed in 1943 by American neurophysiologist Warren McCulloch and mathematician Walter Pitts [
29]. DNNs are a subclass of artificial neural networks (ANNs) and emulate human brain information processing through multiple layers of neurons, enabling the solution of complex, data-driven problems.
A DNN is a multilayer neural network in which neurons are interconnected to form a deep architecture. It can process diverse data types, including images, text, and speech. The number of hidden layers and neurons per layer depends on the specific task and data characteristics. Training a DNN typically involves backpropagation and gradient descent optimization, iteratively adjusting network parameters to minimize prediction errors and the loss function.
The training process consists of four primary steps: forward propagation, backward propagation, weight gradient computation, and weight updating. As illustrated in
Figure 4, training data are fed into the network in batches, with forward computation proceeding layer by layer until the output layer is reached. The network output is then compared to the true labels, and the loss is calculated using a loss function. During backward propagation, gradients of the loss function with respect to each layer are computed via the chain rule. These weight gradients are then used to update network weights, completing the training cycle.
(3) Optimizer
Stochastic Gradient Descent (SGD) is a widely used optimizer and one of the most fundamental algorithms in deep learning. It implements the gradient descent method by updating neural network weights based on the gradient computed from each training sample, which is why it is also referred to as online learning. Compared to batch gradient descent (BGD), SGD is more efficient, particularly for large datasets. Model parameters are updated in the direction of the negative gradient, gradually minimizing the loss function. During training, the partial derivatives of each sample’s error with respect to all parameters are computed and used to update the parameters iteratively until convergence or until a predefined maximum number of iterations is reached.
SGD usually requires manual tuning of the learning rate, and techniques such as learning rate decay are often applied to improve convergence. The choice of learning rate strongly influences SGD performance. Convergence is typically slow in flat regions near local minima, but the inherent stochasticity can help the model escape shallow minima and achieve better generalization.
Adaptive Moment Estimation (Adam) is an optimizer with adaptive learning rates, developed based on momentum gradient descent and adaptive learning rate methods. Adam assigns different weights to different gradients, enabling faster and more stable convergence. It combines momentum and RMS Prop by computing exponential moving averages of the first moment (mean) and second moment (uncentered variance) of gradients, followed by bias correction to ensure accurate estimates during early training. This correction is crucial; without it, underestimated gradients can lead to overly small update steps, slowing convergence [
32].
Adam automatically adjusts learning rates, usually eliminating the need for manual tuning, and often converges faster than SGD during the initial training phase. However, despite its rapid convergence in many tasks, Adam may lead to overfitting in some cases, and its generalization ability can be inferior to that of SGD.
Model training was conducted using both Adam and SGD optimizer, each with an initial learning rate set to 0.001. To ensure stability and comparability between the two training processes, no dynamic learning rate scheduling was implemented. The convergence criterion was defined as follows: training would cease if the validation loss showed no significant improvement—specifically, a decrease of less than 1 × 10−5—over 20 consecutive epochs, or when the maximum preset epoch count of 200 was reached. In practice, both optimizer demonstrated stable convergence after approximately 150 training epochs.
5. Conclusions
This study addresses natural fracture identification and parameter prediction in the Chang 7 shale oil reservoirs of the Ordos Basin through a multimodal deep neural network (DNN) model integrating conventional logging curves and formation imaging data. The key findings are summarized as follows:
(1) Standardized conventional logging data revealed that lower AC, CNL, and GR values correspond to reduced fracture probabilities, while CAL, DEN, RT, and SP exhibit elevated fracture probabilities within specific ranges, highlighting the nonlinear coupling between fracture distribution and logging responses.
(2) The CNN model effectively fused logging and imaging data, achieving stable convergence with ~87% identification accuracy. PR and ROC analyses indicated robust performance (F1 = 0.88, AUC = 0.95), with predicted fracture locations closely matching measured data.
(3) SGD-optimized DNN models improved fracture inclination accuracy by 2.9% (training) and 3.64% (testing), and fracture width accuracy by 7.76% (training) and 8.56% (testing). Post-optimization, fit accuracies reached 98.82% for inclination and 95.97%/95.91% for width.
(4) Prediction errors were <0.48°for inclination and <0.21 cm for width. Inclinations clustered between 65 and 80°and widths between 0.54 and 0.5 cm, demonstrating strong generalization and precise parameter control.
(5) Comparison with 15 FMI-measured fracture samples from a randomly selected well confirmed model accuracy, with maximum deviations of 0.48° (inclination) and 0.21 cm (width, excluding a 1.96 cm outlier), supporting the model’s applicability for shale reservoir fracture prediction.
The proposed model demonstrates promising potential for engineering applications within the Chang 7 shale formation of the Ordos Basin. Its architecture is designed to be scalable, and it offers a cost-effective advantage during the prediction phase by relying solely on conventional logging data.