Hybrid Lithology Identification Method Based on Isometric Feature Mapping Manifold Learning and Particle Swarm Optimization-Optimized LightGBM
Abstract
:1. Introduction
2. Research Technique
2.1. Equal Metric Mapping
- (1)
- Construct the neighborhood graph G for high-dimensional data points using the k-nearest neighbor method.
- (2)
- Calculate the geodesic distance matrix DG between high-dimensional data points, represent the geodesic distance dG (xi, xj) by the shortest path between xi and xj on the graph G, and obtain the geodesic distance matrix DG as follows:
- (3)
- Compute the low-dimensional embedding of high-dimensional data. Substitute the geodesic distance matrix DG into the MDS algorithm, and calculate the centered inner product matrix of the constructed original matrix X using Equation (2):
- (4)
- Compute the reduced-dimensional data Y. Let Λ be a diagonal matrix constructed from the d largest eigenvalues of matrix B, and let a = (a1, a2, ..., ad), where a1, a2, ..., ad are the corresponding eigenvectors of the eigenvalues. The output Y after dimensionality reduction is given by:
2.2. Particle Swarm Optimization Algorithm
2.3. Lightweight Gradient Hoist
- (1)
- LightGBM incorporates a histogram-based decision tree algorithm. When traversing the data, it accumulates statistical quantities in histograms based on the discretized values, and after one pass of data, it searches for the optimal split point based on the discrete values in the histograms.
- (2)
- LightGBM incorporates histogram differencing acceleration. The histogram of a leaf node can be obtained by subtracting the histograms of its parent node and its sibling node, which can double the speed of computation.
- (3)
- LightGBM incorporates a Leaf-wise algorithm with a depth constraint. It discards the commonly used level-wise growth strategy in most gradient boosting algorithms, which is relatively inefficient as it searches and splits the nodes in the same level without distinction [19]. Instead, LightGBM uses a Leaf-wise growth algorithm with a depth constraint, which accelerates the computation speed and prevents overfitting.
3. Experimental Data and Processing
3.1. Data Collection
3.2. Feature Extraction
3.3. Algorithm Parameter Setting
Experimental Flow
4. Results and Discussion
4.1. Parameter Optimization
4.2. Results
5. Conclusions
- (1)
- Manifold learning methods can map high-dimensional data to a lower-dimensional space, allowing for visualization of well logging and cutting logging data while reducing the complexity of model construction. ISOMAP, as a non-linear dimensionality reduction method, is better able to preserve the local structure and similarity of the data compared with linear dimensionality reduction methods such as PCA, ICA, and NMF. Additionally, when the number of neighbors for ISOMAP is set to eight, the LightGBM model achieves the highest balanced accuracy of 0.829.
- (2)
- Balanced accuracy can handle imbalanced rock type datasets, and the LightGBM model optimized based on balanced accuracy demonstrates more balanced recognition performance across all rock types compared with the model optimized based on accuracy. Balanced accuracy avoids overfitting the LightGBM model towards the majority of mudstone samples, effectively improving the recognition accuracy of minority classes in imbalanced rock-type data.
- (3)
- The PSO algorithm can help the LightGBM model automatically search the hyperparameter space and find the optimal hyperparameter configuration, thereby improving the model’s performance and generalization ability. Future research could explore more efficient optimization strategies or introduce other metaheuristic algorithms, such as genetic algorithms or ant colony optimization, to further enhance parameter tuning efficiency and effectiveness. Additionally, considering ensemble learning techniques such as improved random forests or boosting tree algorithms could enhance the model’s generalization ability to handle complex or noisy datasets. Furthermore, by comparing with advanced machine learning models like deep learning networks, the effectiveness of the methods can be validated and the strengths and weaknesses in processing geological data can be investigated. Moreover, extensive testing of the model on various geological datasets and in actual oil field projects is essential to assess its real-world performance and applicability, thus transforming these technologies into practical geological analysis tools.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Fu, G.; Yan, J.; Zhang, K.; Hu, H.; Luo, F. Current status and progress of lithology identification technology. Prog. Geophys. 2017, 32, 26–40. [Google Scholar]
- Ren, Q.; Zhang, H.; Zhang, D.; Zhao, X.; Yan, L.; Rui, J.; Zeng, F.; Zhu, X. A framework of active learning and semi-supervised learning for lithology identification based on improved naive Bayes. Expert Syst. Appl. 2022, 202, 117278. [Google Scholar] [CrossRef]
- Xu, T.; Chang, J.; Feng, D.; Lv, W.; Kang, Y.; Liu, H.; Li, J.; Li, Z. Evaluation of active learning algorithms for formation lithology identification. J. Pet. Sci. Eng. 2021, 206, 108999. [Google Scholar] [CrossRef]
- Lin, S.; Han, Z.; Li, D.; Zeng, J.; Yang, X.; Liu, X.; Liu, F. Integrating model-and data-driven methods for synchronous adaptive multi-band image fusion. Inf. Fusion 2020, 54, 145–160. [Google Scholar] [CrossRef]
- Ren, Q.; Zhang, H.; Zhang, D.; Zhao, X. Lithology identification using principal component analysis and particle swarm optimization fuzzy decision tree. J. Pet. Sci. Eng. 2023, 220, 111233. [Google Scholar] [CrossRef]
- Xu, Z.; Ma, W.; Lin, P.; Shi, H.; Pan, D.; Liu, T. Deep learning of rock images for intelligent lithology identification. Comput. Geosci. 2021, 154, 104799. [Google Scholar] [CrossRef]
- Sun, Z.; Jiang, B.; Li, X.; Li, J.; Xiao, K. A data-driven approach for lithology identification based on parameter-optimized ens14emble learning. Energies 2020, 13, 3903. [Google Scholar] [CrossRef]
- Singh, H.; Seol, Y.; Myshakin, E.M. Automated well-log processing and lithology classification by identifying optimal features through unsupervised and supervised machine-learning algorithms. SPE J. 2020, 25, 2778–2800. [Google Scholar] [CrossRef]
- Xie, Y.; Zhu, C.; Zhou, W.; Li, Z.; Liu, X.; Tu, M. Evaluation of machine learning methods for formation lithology identification: A comparison of tuning processes and model performances. J. Pet. Sci. Eng. 2018, 160, 182–193. [Google Scholar] [CrossRef]
- Liang, H.; Chen, H.; Guo, J.; Bai, J.; Jiang, Y. Research on lithology identification method based on mechanical specific energy principle and machine learning theory. Expert Syst. Appl. 2022, 189, 116142. [Google Scholar] [CrossRef]
- Han, X.; Su, J.; Hong, Y.; Gong, P.; Zhu, D. Mid-to Long-Term Electric Load Forecasting Based on the EMD–Isomap–Adaboost Model. Sustainability 2022, 14, 7608. [Google Scholar] [CrossRef]
- Samko, O.; Marshall, A.D.; Rosin, P.L. Selection of the optimal parameter value for the Isomap algorithm. Pattern Recognit. Lett. 2006, 27, 968–979. [Google Scholar] [CrossRef]
- Anowar, F.; Sadaoui, S.; Selim, B. Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Comput. Sci. Rev. 2021, 40, 100378. [Google Scholar] [CrossRef]
- Wang, D.; Tan, D.; Liu, L. Particle swarm optimization algorithm: An overview. Soft Comput. 2018, 22, 387–408. [Google Scholar] [CrossRef]
- Jain, M.; Saihjpal, V.; Singh, N.; Singh, S.B. An overview of variants and advancements of PSO algorithm. Appl. Sci. 2022, 12, 8392. [Google Scholar] [CrossRef]
- Xing, Z.; Zhu, J.; Zhang, Z.; Qin, Y.; Jia, L. Energy consumption optimization of tramway operation based on improved PSO algorithm. Energy 2022, 258, 124848. [Google Scholar] [CrossRef]
- Wang, D.; Li, L.; Zhao, D. Corporate finance risk prediction based on LightGBM. Inf. Sci. 2022, 602, 259–268. [Google Scholar] [CrossRef]
- Liang, W.; Luo, S.; Zhao, G.; Wu, H. Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms. Mathematics 2020, 8, 765. [Google Scholar] [CrossRef]
- Li, L.; Liu, Z.; Shen, J.; Wang, F.; Qi, W.; Jeon, S. A LightGBM-based strategy to predict tunnel rockmass class from TBM construction data for building control. Adv. Eng. Inform. 2023, 58, 102130. [Google Scholar] [CrossRef]
- Liu, Z.; Li, D.; Liu, Y.; Yang, B.; Zhang, Z.-X. Prediction of uniaxial compressive strength of rock based on lithology using stacking models. Rock Mech. Bull. 2023, 2, 100081. [Google Scholar] [CrossRef]
- Vafaei, N.; Ribeiro, R.A.; Camarinha-Matos, L.M. Comparison of normalization techniques on data sets with outliers. Int. J. Decis. Support Syst. Technol. (IJDSST) 2022, 14, 1–17. [Google Scholar] [CrossRef]
- Deng, S.; Pan, H.; Wang, H.; Xu, S.-K.; Yan, X.-P.; Li, C.-W.; Peng, M.-G.; Peng, H.-P.; Shi, L.; Cui, M.; et al. A hybrid machine learning optimization algorithm for multivariable pore pressure prediction. Pet. Sci. 2024, 21, 535–550. [Google Scholar] [CrossRef]
Dimensionality Reduction Technique | Peculiarity | Advantage | Restrict | Effect Evaluation Method Applied to Lithology Identification |
---|---|---|---|---|
ISOMAP | Nonlinear dimensionality reduction to maintain geodesic distance between data points | Better retention of local structure and similarity of data | The calculation complexity is high and it is sensitive to parameter selection | Use machine learning models (e.g. SVM, random forest) for classification and compare accuracy on test sets |
PCA | Linear dimensionality reduction, finding the principal component of the data by maximizing variance | It has good effect and simple calculation when dealing with linear distribution data | Important nonlinear structural information may be lost | Through the visual clustering effect and classification model precision comparison |
ICA | Linear dimensionality reduction to find maximum independence between components | Suitable for source signal separation, emphasizing the independence of components | The ability to recognize signals from non-independent sources is limited | Test the performance of the classification model on the data after dimensionality reduction |
NMF | Linear dimensionality reduction, decomposes data into a non-negative matrix | Suitable for processing non-negative data, such as image data | The non-negative requirements on data limit the scope of application | The clustering quality and classification accuracy of geological data are analyzed |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, G.; Deng, S.; Xu, S.; Li, C.; Wei, W.; Zhang, H.; Li, C.; Gong, W.; Pan, H. Hybrid Lithology Identification Method Based on Isometric Feature Mapping Manifold Learning and Particle Swarm Optimization-Optimized LightGBM. Processes 2024, 12, 1593. https://doi.org/10.3390/pr12081593
Wang G, Deng S, Xu S, Li C, Wei W, Zhang H, Li C, Gong W, Pan H. Hybrid Lithology Identification Method Based on Isometric Feature Mapping Manifold Learning and Particle Swarm Optimization-Optimized LightGBM. Processes. 2024; 12(8):1593. https://doi.org/10.3390/pr12081593
Chicago/Turabian StyleWang, Guo, Song Deng, Shuguo Xu, Chaowei Li, Wan Wei, Haolin Zhang, Changsheng Li, Wenhao Gong, and Haoyu Pan. 2024. "Hybrid Lithology Identification Method Based on Isometric Feature Mapping Manifold Learning and Particle Swarm Optimization-Optimized LightGBM" Processes 12, no. 8: 1593. https://doi.org/10.3390/pr12081593
APA StyleWang, G., Deng, S., Xu, S., Li, C., Wei, W., Zhang, H., Li, C., Gong, W., & Pan, H. (2024). Hybrid Lithology Identification Method Based on Isometric Feature Mapping Manifold Learning and Particle Swarm Optimization-Optimized LightGBM. Processes, 12(8), 1593. https://doi.org/10.3390/pr12081593