Hybrid Architecture for Tight Sandstone: Automated Mineral Identification and Quantitative Petrology
Abstract
1. Introduction
- 1.
- Boundary Ambiguity: Grains in tight sandstone exhibit highly blurred boundaries. Existing segmentation algorithms like traditional U-Net still show a misjudgment rate of 10%–12% and fail to accurately distinguish grain contact boundaries from microcracks.
- 2.
- Mineral Confusion: Quartz, feldspar, and lithic fragments share overlapping characteristics. Mainstream ResNet models achieve a 15%–25% error rate in mineral classification and particularly struggle to differentiate alteration minerals from native minerals.
- 3.
- Structural Feature Deficiency: Current methods neglect grain contact relationships, lack quantitative analysis of cementation types, and exhibit measurement errors exceeding ±0.3 grade (Krumbein standard).
- 4.
- Small Sample Challenge: Professional annotated data is scarce, transfer learning demonstrates poor generalization across regional samples, and models show sensitivity to image noise.
- 1.
- The pre-segmentation of fusion zero-sample SAM and Mask R-CNN instance segmentation networks are used to improve the accuracy of grain boundary segmentation.
- 2.
- By combining serialized modeling and traditional CNN structure, three token mixer units of deep separable convolution, weighted mixed pooling, and CGA global channel attention mechanism are adopted to build an improved MetaFormer model for accurate mineral identification.
- 3.
- The GeoParam module is designed to quantify the key structural features and output the statistical analysis results for tight sandstone images.
2. Architecture and Methods
2.1. System Architecture
2.2. Core Module Design
2.2.1. Fusion Segmentation Module: SAM Pre-Segmentation + Mask R-CNN Segmentation Module
2.2.2. Recognition Module Based on Improved MetaFormer
- 1.
- Dynamic Size Adjustment Module: Resolves inconsistent image sizes in sandstone micrographs by ensuring input dimensions are divisible by Patch Size to prevent invalid computations.
- 2.
- Multi-scale Feature Extraction Layer: The shallow layer employs hybrid pooling and DSConv Blocks to enhance local features. The deep layer utilizes improved multi-scale attention (MSA) to aggregate global semantics, balancing local details with global features.
- 3.
- Classification Head: Maps features to category probabilities through LayerNorm and fully connected layers, ultimately producing classification results.
Algorithm 1 Sandstone Particle Recognition Algorithm |
Input: Input image I, Number of classes C, Patch size P = 16, Shallow layers Lₛ = 8, Deep layers Ld = 4 Output: Class probability vector p ∈ ℝC 1: I’ ← DynamicResize(I, P) // Dynamic resizing 2: F ← PatchEmbedding(I’, P) // Patch embedding 3: for l ← 1 to Lₛ do 4: k ← 5 if l = 1 else 3 // Large kernel for first layer 5: F ← ShallowBlock(F, k) // Shallow feature extraction 6: end for 7: for l ← 1 to Ld do 8: F ← DeepBlock(F) // Deep feature extraction 9: end for 10: p ← ClassificationHead(F, C) // Classification 11: return p |
Algorithm 2 Hybrid Spatial-Channel Attention |
Input: Input tensor X ∈ ℝB×N×D Output: Output tensor Y ∈ ℝB×N×D 1: B, N, D ← shape(X) 2: H ← √N, W ← √N // Calculate spatial dimensions 3: // 1. Spatial attention 4: QKV ← Linear(X) // Project to Q,K,V 5: Q, K, V ← split(QKV) 6: A ← softmax(QKᵀ/√dₕ) // Attention weights 7: Ospatial ← A·V 8: // 2. Channel attention 9: Xspatial ← reshape(X, B, D, H, W) 10: Cattn ← ChannelAttn(Xspatial) 11: // 3. Feature fusion 12: Ofused ← reshape(Ospatial, B, D, H, W)⊗Cattn 13: Oseq ← reshape(Ofused, B, N, D) 14: // 4. Output projection 15: Y ← Linear(Oseq) 16: return Dropout(Y) |
// Channel Attention Module Function ChannelAttn(X): 1: C ← AdaptiveAvgPool2d(1)(X) 2: C ← Conv2d(C, D→D/4) 3: C ← ReLU(C) 4: C ← Conv2d(C, D/4→D) 5: return Sigmoid(C) End Function |
Algorithm 3 Mixed Pooling |
Input: Input feature F ∈ ℝB×N×D, Mixing ratio α = 0.5 Output: Output feature F′ ∈ ℝB×N×D 1: Fmax ← MaxPool1d(F, kernel = 3) // Max pooling 2: Fmax ← Conv1d(Fmax, 1 × 1) 3: Favg ← AvgPool1d(F, kernel = 3) // Average pooling 4: Favg ← Conv1d(Favg, 1 × 1) 5: Fmix ← α·Fmax + (1-α)·Favg // Weighted fusion 6: F′ ← Conv1d(Fmix, 1 × 1) // Feature enhancement 7: return F′ |
Algorithm 4 Depthwise Separable Convolution Block (Simplified) |
Input: Input sequence X with shape (B, N, D) Input: Stride S, Expansion ratio E = 4 Output: Output sequence Y with shape (B, N’, D) 1: // 1. Input normalization and reshape 2: Xnorm ← LayerNorm(X) 3: Xspatial ← reshape(Xnorm, (B, D, √N, √N)) 4: // 2. Depthwise separable convolution 5: Yconv ← DepthwiseConv2D(Xspatial, K = 3, stride = 1) 6: Yconv ← PointwiseConv2D(Yconv, groups = D) 7: // 3. Channel expansion and downsampling 8: Yexp ← PointwiseConv2D(Yconv, out_channels = E × D) 9: Yexp ← ReLU6(BatchNorm(Yexp)) 10: Yds ← DepthwiseConv2D(Yexp, K = 3, stride = S) 11: // 4. Channel projection and reshape 12: Yproj ← PointwiseConv2D(Yds, out_channels = D) 13: Yproj ← BatchNorm(Yproj) 14: Yseq ← reshape(Yproj, (B, D, N′)) 15: // 5. Residual connection and MLP 16: if stride S = 1 and same spatial size then 17: Yout ← Yseq + reshape(Xnorm, (B, D, N)) 18: else 19: Yout ← Yseq 20: end if 21: Ymlp ← MLP(LayerNorm(Yout)) 22: Y ← Yout + Ymlp 23: return Y |
2.2.3. GeoParam Parameter Analysis Module
- 1.
- The calculation of grain size characteristics includes the following points:
- Using the equivalent diameter method: extract the maximum length of the minimum bounding rectangle of grains, and convert it into physical size (mm level) according to the scale of microscopic images.
- To construct the cumulative frequency curve: divide 32 φ value intervals to calculate the grain area distribution, and calculate the key percentile values such as D50 and D84.
- Calculation of sediment environmental parameters: average grain diameter , standard deviation , skewness , kurtosis .
- 2.
- The construction of the rounding degree quantification system includes the following two points:
- Define the contour geometry index: (A is the contour area, P is the contour perimeter [21])
- Establish five-level classification criteria (Table 2).
- 1.
- The content of the selection analysis is as follows:
- Separation coefficient Equation (6):
- Classification rules:
- 2.
- Contact relationship determination mechanism is as follows:
- Spatial topology analysis: calculation of grain contour intersection and union ratio (IoU), equivalent diameter ratio (Req), and contact line length (L).
- Three-level classification standard:
- 1.
- Porosity is the percentage of total area of pores in an image relative to the total area of the image.
- The calculation Equation (7):
- Calculation logic: Extract the image height (imageHeight) and width (imageWidth) from json_data, then calculate the total area of the image. Traverse each pore shape in json_data, use the cv.contourArea function to compute the area of each pore, and accumulate these areas to obtain the total pore area. Calculate the percentage of the total pore area relative to the total image area to determine the porosity rate. The output value is presented as a percentage, indicating the proportion of pore area to the total image area.
- 1.
- Pore diameter (Pore Diameter) refers to the equivalent circular diameter of a pore calculated based on the pore area.
- The calculation Equation (8):
- Computational Logic: This equation assumes the pore is a perfect circle, calculating its diameter. The input area (the pore’s surface area) is used in the equation to determine the pore diameter, where Area represents the pore’s surface area. The output diameter is typically measured in millimeters or micrometers, depending on the unit of the area.
- 1.
- Naming of terrestrial source clastic components (NameTcc):
- Naming basis: based on the area proportion of various grains (quartz, feldspar, cuttings).
- Rule logic diagram (Figure 5):
- 2.
- Integrated rock nomenclature (Name):
- Naming basis: grain size name + terrestrial clastic component name;
- Grain size classification criteria (Table 3).
- 3.
- A typical example:
- Main grain size: 0.25 mm (φ = 2) → Medium Sandstone;
- Component name: Feldspar Quartz Sandstone;
- Final name: Medium Feldspar Quartz Sandstone.
3. Materials and Experimental Analysis
3.1. Environment and Materials Preparation
- 1.
- Environment: NVIDIA RTX3090 GPU + Intel Core i5-12490f CPU + 32GB Memory + Windows10 Operating System
- 2.
- Dateset:
- The micrographs of tight sandstone used in this research were provided by the Oil and Gas Accumulation Laboratory of Sinopec.
- This study constructs a sandstone image dataset derived from diverse sources, encompassing core and outcrop samples from multiple critical petroliferous regions, including the Ordos Basin (Hangjinqi and Linxing areas), Tarim Basin (Taxi area), and Bohai Bay Basin (Bozhong Sag). Approximately half of the data were acquired from research projects and laboratory tests, covering key sedimentary facies such as braided river deltas and fan deltas, which represent typical reservoir facies like tight sandstone and marine sandstone. The other half were meticulously selected from the Rock Micro-image Thematic Database of the China Scientific Data platform, further enriching the diversity of compositional and textural characteristics. Overall, the image features exhibit both comprehensiveness and representativeness, thereby ensuring that our intelligent analysis models possess enhanced geological relevance and practical predictive capability.
- The process of rock category annotation is to circle the rock grains with polygons formed by points and label them, and then extract the labeled substances in the annotated dataset. According to the polygon contour area of the labeled substances, a single image is cut out, as shown in Figure 6.
- At present, this system has labeled more than 800 images and more than 81,000 grains, forming a dataset of a certain scale: segmentation dataset SMISD and recognition dataset SMIRD [28,29]. In addition, the image rotation method is also used for data enhancement during the training process, as shown in Figure 7 (where the green grains in each figure are the same grain).
- To ensure the reliability and consistency of the annotated data, a rigorous verification protocol was implemented, comprising multi-annotator independent labeling, cross-validation, and expert review. Three specialists proficient in rock thin-section identification independently annotated a representative subset of 200 images (approximately 25% of the total dataset) using AnyLabeling software, in strict accordance with unified guidelines derived from the SY/T 5368-2016 industry standard. Consistency was quantitatively assessed using Intersection over Union (IoU) for grain boundary delineation (yielding an average IoU ≥ 0.85) and Cohen’s Kappa coefficient for mineral categorization (Kappa ≥ 0.82), reflecting substantial to near-perfect inter-annotator agreement. Instances of discrepancy (IoU < 0.7 or categorical mismatch, accounting for 5.3% of the subset) were referred to a senior geologist for arbitration and final resolution. Additionally, throughout data augmentation procedures—such as rotation and flipping—specific measures were enforced to maintain semantic accuracy in the annotations and avoid the introduction of biases. The finalized annotation consistency achieved for the constructed SMISD and SMIRD datasets reached 91.7%, satisfying the high-quality threshold necessary for robust deep learning model training.
- 3.
- Key dependencies and hyperparameters:
3.2. Experimental Analysis
4. Results and Visualization
4.1. Segmentation Module Outputs
4.2. Identification Module Outputs
- 1.
- {
- 2.
- "version": "4.5.6",
- 3.
- "flags": {},
- 4.
- "shapes": [
- 5.
- # Identification parameters and boundary coordinate array of the initial segmented grain (pseudocolor yellow, Figure 11).
- 6.
- {
- 7.
- "label": "\u77f3\u82f1",
- 8.
- "points": [[690.0, 178.0], …], # The verticesarray contains polygon boundary coordinates, e.g., [690.0, 178.0].
- 9.
- "bbox": [ 616.0, 178.0, 746.0, 341.0], # Minimum circumscribed rectangle of grain entity, enabling rapid spatial indexing.
- 10.
- "shape_type": "polygon", # Shape annotation designated as polygonal boundaries.
- 11.
- "score": 0.9984123706817627 # Confidence metrics for geological feature detection.
- 12.
- }, …
- 13.
- # Remaining detrital grains exhibit analogous segmentation parameters without further elaboration.
- 14.
- ],
- 15.
- "imagePath": "X53-1693.45m1-.png",
- 16.
- "imageHeight": 1024,
- 17.
- "imageWidth": 768
- 18.
- }
4.3. Morphometric Analysis Module Outputs
- 1.
- Figure 13 presents a bar chart showing roundness distribution across granulometric classes. Mean particle roundness indices per grain-size fraction are calculated from area/frequency tables of 12 discrete granulometric intervals. The x-axis denotes Udden–Wentworth grain-size classifications (e.g., Fine Sand, Medium Sand), while the y-axis represents particle roundness indices (dimensionless scale: 0–1.00).
- 2.
- Figure 14 presents a pie chart illustrating relative abundance of terrigenous detrital components.
- 3.
- Figure 15 presents a detrital grain size distribution pie chart. This visualization is derived from area/frequency tables across twelve granulometric intervals within the grain-size analysis framework.
- 4.
- Figure 16 presents a tripartite grain size distribution plot integrating three graphical representations as follows:
- Frequency distribution histogram with φ-scale granulometric intervals on the x-axis and areal percentage per fraction on the y-axis;
- Cumulative frequency curve sharing identical φ-scale x-axis with cumulative areal percentage ordinate;
- Normal probability cumulative plot maintaining φ-scale abscissa while displaying probability percentage values.
- 5.
- Figure 17 presents a line plot of areal distribution across granulometric classes employing identical methodology to the cumulative frequency plot within the tripartite grain size analysis, substituting phi (φ) values with absolute grain size ranges along the abscissa.
- 1.
- {
- 2.
- "Roundness": [ ‘……’ ], # Roundness indices list quantifying angularity for all grains
- 3.
- "Rdstype": [ ‘……’ ], # Roundness classification list (0-4 scale: e.g., angular, subangular)
- 4.
- "RdAreaSum": [ ‘……’ ], # Total areal coverage per roundness class (scale-converted)
- 5.
- "RdPercentage": [ ‘……’ ], # Areal percentage per roundness class relative to total grain area
- 6.
- "RdResult": "Angular and Subangular", # Dominant roundness types (highest-combined percentage classes)
- 7.
- "MaximumGrainSize": 0.42, # Maximum grain diameter (mm)
- 8.
- "MaximumGrainSizeTypeφ": 1.511264044967257, # φ-equivalent of maximum grain size
- 9.
- "MainSizeRange": "No Statistically Dominant Grain-Size Fraction", # Primary granulometric distribution interval
- 10.
- "SizePercentage": [ ‘……’ ], # Volumetric percentage per sieve-grade fraction (10-class)
- 11.
- "AccPercentage": [ ‘……’ ], # Cumulative volumetric percentage per fraction (10-class)
- 12.
- "SizePercentage32types": [ ‘……’ ], # Areal percentage per fraction (32-class)
- 13.
- "CumFrequency32types": [ ‘……’ ], # Cumulative areal percentage per fraction (32-class)
- 14.
- "StandardDeviation": 0.4258, # Graphic standard deviation (measure of distribution dispersion)
- 15.
- "S0": 1.53, # Sorting coefficient (lower values = better sorting)
- 16.
- "Mz": 2.51, # Graphic mean grain size (overall coarseness indicator)
- 17.
- "Sk1": 0.0731, # Graphic skewness (distribution asymmetry: + = fine-skewed, - = coarse-skewed)
- 18.
- "Kg": 0.9305, # Graphic kurtosis (distribution peakedness)
- 19.
- "Cvalue": 0.3363, # C-value parameter (sedimentary environment discriminator)
- 20.
- "Mvalue": 0.179, # M-value parameter (sedimentary environment discriminator)
- 21.
- "MzMoment": 2.5153, # Moment measure mean grain size
- 22.
- "SDMoment": 0.421, # Moment measure standard deviation
- 23.
- "Sk1Moment": 0.0358, # Moment measure skewness
- 24.
- "KgMoment": 2.9268, # Moment measure kurtosis
- 25.
- "FractionResult": "well", # Sorting evaluation (e.g., "well"/"moderate"/"poor")
- 26.
- "TotalDebris": 73.17, # Terrigenous component percentage of total sample
- 27.
- "GrainPercentage": [62.56, 21.79,15.65], # Relative percentages of grain types within terrigenous fraction
- 28.
- "NameTcc": "Lithic Feldspar Sandstone", # Lithological classification based on ternary components (e.g., "Lithic Feldspar Sandstone")
- 29.
- "Name": "Medium Lithic Feldspar Sandstone", # Comprehensive lithological name incorporating granulometry (e.g., "Medium Lithic Feldspar Sandstone")
- 30.
- "total_contacts": 174, # Total grain-grain contacts (for thin-section reconstruction)
- 31.
- "Contact_percentage": { "No contact": 10.92, "Point contact": 52.87, "Concavo-convex contact": 36.21 }, # Percentage distribution of contact types
- 32.
- "Dominant_contact_type": ["Point contact", 52.87], # Prevalent contact type and percentage
- 33.
- "labels": [ ‘……’ ], # Mineralogical classifications (e.g., quartz, feldspar) for reconstruction matching
- 34.
- "diameters": [ ‘……’ ], # Grain diameters (φ-scale)
- 35.
- "perimeters": [ ‘……’ ], # Grain perimeters (scale-converted)
- 36.
- "areas": [ ‘……’ ], # Grain areas (scale-converted)
- 37.
- "img_size": 786432 # Image pixel dimensions (for reconstruction scaling)
- 38.
- }
- 1.
- {
- 2.
- "version": "4.5.6",
- 3.
- "shapes": [
- 4.
- {
- 5.
- "label": "Pores",
- 6.
- "points": [[480, 736], [458, 741], [479, 752]],
- 7.
- "shape_type": "polygon"
- 8.
- }, …
- 9.
- # Remaining solvent-filled pores exhibit analogous morphometric quantification parameters without further elaboration.
- 10.
- ],
- 11.
- "imagePath": "X53-1693.45m1-.png",
- 12.
- "imageHeight": 1024,
- 13.
- "imageWidth": 768
- 14.
- }
- 1.
- {
- 2.
- "porosity": 9.44, # porosity
- 3.
- "pore_diameters": [
- 4.
- 21.17, 36.77, 19.75, 27.01, 24.72, 22.27, 20.44, 36.93, 29.58, 31.48, 37.44, 25.31, 25.33, 22.69, 20.23, 22.24, 87.39, 56.0, 31.98, 58.81, 54.79, 60.13, 114.65, 21.34, 19.75, 53.17, 64.8, 26.67, 27.58, 55.69, 21.47, 22.97, 39.84, 25.57, 27.63, 28.45, 21.89, 103.9, 44.19, 51.65, 84.13, 8.24, 29.27, 21.72, 1.36, 48.61, 61.05, 78.93, 55.32
- 5.
- ] # Equivalent Pore Diameter List
- 6.
- }
5. Discussion
- 1.
- Synergistic mechanism innovation between SAM and Mask R-CNN
- Through SAM’s zero-sample pre-segmentation to generate dual-polarization light-guided annotation (Figure 3), combined with an improved Mask R-CNN boundary optimization loss function (Formula 2), we achieve for the first time a reduced quartz particle segmentation error rate from 12% to 3.8% (Table 5) and decreased cuttings particle missed segmentation rate by 7.4% (Compared with Ablation 1).
- Core innovation: Establishing a closed-loop “pre-segmentation/correction/precision segmentation” mechanism to resolve boundary ambiguity issues caused by mineral spectral overlap in traditional methods (Figure 2).
- 2.
- MetaFormer cascaded token mixer architecture
- Pioneering a three-stage cascade structure combining DSConv (local features), Hybrid Pooling (multi-scale features), and CGA (global attention) (Figure 4), achieved 90.65% mineral recognition accuracy on the SMIRD dataset (Table 6) and reduced FLOPs by 71% while increasing inference speed by 52% (Table 1).
- Core innovation: Overcoming the MSA redundancy bottleneck in traditional Transformers and achieving cross-modal fusion of mineral spectral features and morphological characteristics for the first time.
- 1.
- Optical-CL-SEM tiered workflow:
- Optical screening for rapid bulk mineralogy (<2 min/sample);
- CL verification for feldspar subtype/zoning analysis;
- SEM-AM nano-pore/cement quantification.
- 2.
- Data-centric augmentation:
- GAN-synthesized stained/non-stained image pairs to bridge spectral gaps;
- Strategic oversampling of rare minerals (e.g., glauconite, heavy minerals).
- 3.
- Algorithmic robustness engineering:
- Stain-invariant networks with spectral unmixing modules;
- Adversarial training for illumination/magnification invariance.
6. Conclusions
- 1.
- Primary research contributions:
- A novel hybrid recognition framework was developed that effectively balances mineral identification accuracy with computational efficiency, providing a scalable solution for automated petrological analysis.
- An end-to-end workflow significantly reduces analysis time per sample from over 30 min to under 2 min while maintaining verifiable measurement precision, demonstrating strong potential for operational deployment.
- 2.
- Implications and applications:
- 3.
- Future directions:
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Al-Amri, S.S.; Kalyankar, N.V. Image segmentation by using threshold techniques. arXiv 2010, arXiv:1005.4020. [Google Scholar] [CrossRef]
- Muñoz, X.; Freixenet, J.; Cufí, X.; Martı, J. Strategies for image segmentation combining region and boundary information. Pattern Recogn. Lett. 2003, 24, 375–392. [Google Scholar] [CrossRef]
- Muthukrishnan, R.; Radha, M. Edge detection techniques for image segmentation. Int. J. Comput. Sci. Inf. Technol. 2011, 3, 259. [Google Scholar] [CrossRef]
- Peng, B.; Zhang, L.; Zhang, D. A survey of graph theoretical approaches to image segmentation. Pattern Recogn. 2013, 46, 1020–1038. [Google Scholar] [CrossRef]
- Zhou, Y.; Starkey, J.; Mansinha, L. Segmentation of petrographic images by integrating edge detection and region growing. Comput. Geosci. 2004, 30, 817–831. [Google Scholar] [CrossRef]
- Ross, B.J.; Fueten, F.; Yashkir, D.Y. Automatic mineral identification using genetic programming. Mach. Vis. Appl. 2001, 13, 61–69. [Google Scholar] [CrossRef]
- Jiang, F.; Gu, Q.; Hao, H.; Li, N.; Wang, B.; Hu, X. A method for automatic grain segmentation of multi-angle cross-polarized microscopic images of sandstone. Comput. Geosci. 2018, 115, 143–153. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, M.C.; Han, S. Automatic Lithology Identification and Classification Method Based on Rock Image Deep Learning. Acta Petrol. Sin. 2018, 34, 333–342. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Liu, X.; Wang, H.; Jing, H.; Shao, A.; Wang, L. Research on Intelligent Identification of Rock Types Based on Faster R-CNN Method. IEEE Access 2020, 8, 21804–21812. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intelligence 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Hu, Q. Deep Learning Classification Method for Rock Thin-Section Images with Multi-Dimensional Information. Master’s Thesis, Zhejiang University, Hangzhou, China, 2019. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 4015–4026. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Yu, W.; Luo, M.; Zhou, P.; Si, C.; Zhou, Y.; Wang, X.; Feng, J.; Yan, S. MetaFormer Is Actually What You Need for Vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Naseer, M.M.; Ranasinghe, K.; Khan, S.H.; Hayat, M.; Shahbaz Khan, F.; Yang, M.H. Intriguing Properties of Vision Transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 23296–23308. [Google Scholar]
- Qin, D.; Leichner, C.; Delakis, M.; Fornoni, M.; Luo, S.; Yang, F.; Wang, W.; Banbury, C.; Ye, C.; Akin, B.; et al. MobileNetV4—Universal Models for the Mobile Ecosystem 2024. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2024. [Google Scholar]
- Liu, X.; Peng, H.; Zheng, N.; Yang, Y.; Hu, H.; Yuan, Y. EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 14420–14430. [Google Scholar]
- Folk, R.L.; Ward, W.C. A Study in the Significance of Grain-Size Parameters. J. Sediment. Petrol. 1957, 27, 3–26. [Google Scholar] [CrossRef]
- Cox, E.P. A Method of Assigning Numerical and Percentage Values to the Degree of Roundness of Sand Grains. J. Paleontol. 1927, 1, 179–183. [Google Scholar]
- SY/T 5368-2016; Identification of Rock Thin Sections. National Energy Administration: Beijing, China, 2016.
- SY/T 5434-2018; Analysis Method for Particle Size of Clastic Rocks. National Energy Administration: Beijing, China, 2018.
- Krumbein, W.C. Size Frequency Distribution of Sediments. J. Sediment. Res. 1934, 4, 65–77. [Google Scholar] [CrossRef]
- Shaanxi Team of Chengdu Geological College. Grain Size Analysis and Application of Sedimentary Rocks (Materials); Geological Publishing House: Beijing, China, 1978; pp. 1–29. [Google Scholar]
- Friedman, G.M. Distribution Between Dune, Beach, and River Sands from the Textural Characteristics. J. Sediment. Petrol. 1961, 31, 514–519. [Google Scholar]
- SY/T 6103-2019; Measurement of Rock Pore Structure—Image Analysis Method. National Energy Administration: Beijing, China, 2019.
- Gui, H. Key Technologies for Intelligent Analysis of Sandstone Micrographs. Master’s Thesis, University of Science and Technology of China, Hefei, China, 2022. [Google Scholar]
- Zhang, Z.Y. Research on Segmentation and Recognition of Sandstone Thin-Section Images. Master’s Thesis, University of Science and Technology of China, Hefei, China, 2020. [Google Scholar]
- Targ, S.; Almeida, D.; Lyman, K. Resnet in Resnet: Generalizing Residual Architectures. arXiv 2016, arXiv:1603.08029. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; Uszkoreit, J.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Model Components/ Indicators | Standard Transformer (ViT-Base) | Improved MetaFormer (Proposed Program) | The Improvement Margin |
---|---|---|---|
Kernel Structure | 12 × (MSA + FFN) | 4 × (DSConv + FFN) and 4 × (Mixed Pooling + FFN) and 4 × (CGA + FFN) | - |
Attention | MSA (Multi-Head Self-Attention) | CGA (Cascaded Group Attention) | - |
Alternative to Token Mixer | - | DSConv + Mixed Pooling | - |
Activation Function | GELU | StarReLU | - |
FLOPs (G) | 17.6 | 5.1 | ↓71% |
Inference Speed (ms/graph) | 25.3 | 12.1 | ↑52% |
Level | The Range of Rd |
---|---|
Angular | ≤0.5 |
Sub-angular | 0.5–0.7 |
Sub-rounded | 0.7–0.8 |
Rounded | 0.8–0.85 |
Well-rounded | >0.85 |
Grain Size Range (mm) | Φ Value Range | Grain Size Name |
---|---|---|
>2 | <−1 | Conglomerate |
0.0625–2 | 4–0 | Sandstone |
0.0039–0.0625 | 8–4 | Siltstone |
<0.0039 | >8 | Mudstone |
Dependent Packages | Version |
---|---|
transformers | 4.28.1 |
matplotlib | 3.5.1 |
numpy | 1.23.0 |
opencv-python | 4.7.0.72 |
pandas | 1.5.3 |
scikit-learn | 1.3.0 |
torch | 2.0.1 + cu118 |
Parameter Category | Parameter Setting | Illustration |
---|---|---|
Optimizer | SGD | Kinetic energy = 0.9, weight decay = 10−4 |
Initial learning rate | 0.001 | Cosine annealing scheduling (T_max = 100) |
Batch size | 2 | High-resolution image memory optimization |
Number of training rounds | 50 | Early termination mechanism (patience = 15) |
Mixing precision training | AMP start using | Accelerate training and reduce video memory usage |
Parameter Category | Parameter Setting | Illustration |
---|---|---|
Optimizer | AdamW | ε = 10−8, β = (0.965,0.99) |
Initial learning rate | 10−3 | Cosine scheduling (warmup = 5 rounds) |
Batch size | 160 | Massively parallel processing |
Number of training rounds | 60 | Fixed rotation training |
Mixing precision training | AMP start using | Accelerate training and reduce video memory usage |
label smoothing | 0.1 | Mitigate overfitting |
Data augmentation | AutoAugment(‘rand-m9-mstd0.5-inc1’), MixUp(α = 0.8) + CutMix(α = 1.0), Random erase (probability 0.25) | Auto-enhancement/ Hybrid enhancement/Simulated occlusion scenario |
Experimental Group | mIoU (%) | mAP@0.5 (%) | Border Misjudgment Rate (%) | Key Findings |
---|---|---|---|---|
Our system (Dual-polarized light + Improved Loss) | 92.1 | 90.3 | 3.8 | Optimal boundary segmentation (Shown in Figure 11) |
Single-polarized light + Improved Loss | 87.5/↓4.6 | 85.1/↓5.2 | 11.2/↑7.4 | More missing grain segmentation of rock debris (Similar to Figure 2b) |
Double-polarized light + Standard Loss | 88.9/↓3.2 | 86.7/↓3.6 | 8.3/↑4.5 | Blurred boundary (See Figure 2a for repeated segmentation) |
Single-polarized light + Standard Loss | 84.3/↓7.8 | 82.5/↓7.8 | 14.7/↑10.9 | The dual problems of missing rock fragments and blurred boundary |
Component | Precision | Recall | F1-Score |
---|---|---|---|
Quartz | 91.2% | 90.5% | 90.8% |
Debris | 89.7% | 88.9% | 89.3% |
Feldspar | 90.5% | 90.1% | 90.3% |
Data Size | Our Method (Random Affine Transformation + Flip + Gaussian Blur) | Traditional Method (Cropping + Color Distortion) | Gain |
---|---|---|---|
20,000 | 78.42% | 56.78% | +21.64% |
40,000 | 86.32% | 62.45% | +23.87% |
80,000 | 90.65% | 65.12% | +25.53% |
Assessment Indicators | Mask R-CNN Benchmark | Hybrid Framework (SAM + Mask R-CNN+ Enhanced MetaFormer) | Gain |
---|---|---|---|
mIoU (%) | 84.3 | 92.1 | ↑7.8% |
Boundary misjudgment rate (%) | 14.7 | 3.8 | ↓10.9% |
Identification rate of mineral components (%) | 74.9 | 90.65 | ↑15.75% |
Porosity measurement deviation (%) | ±2.1 | ±0.3 | ↓85.7% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dong, L.; Sun, C.; Yu, X.; Zhang, X.; Chen, M.; Xu, M. Hybrid Architecture for Tight Sandstone: Automated Mineral Identification and Quantitative Petrology. Minerals 2025, 15, 962. https://doi.org/10.3390/min15090962
Dong L, Sun C, Yu X, Zhang X, Chen M, Xu M. Hybrid Architecture for Tight Sandstone: Automated Mineral Identification and Quantitative Petrology. Minerals. 2025; 15(9):962. https://doi.org/10.3390/min15090962
Chicago/Turabian StyleDong, Lanfang, Chenxu Sun, Xiaolu Yu, Xinming Zhang, Menglian Chen, and Mingyang Xu. 2025. "Hybrid Architecture for Tight Sandstone: Automated Mineral Identification and Quantitative Petrology" Minerals 15, no. 9: 962. https://doi.org/10.3390/min15090962
APA StyleDong, L., Sun, C., Yu, X., Zhang, X., Chen, M., & Xu, M. (2025). Hybrid Architecture for Tight Sandstone: Automated Mineral Identification and Quantitative Petrology. Minerals, 15(9), 962. https://doi.org/10.3390/min15090962