Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (281)

Search Parameters:
Keywords = nearest neighbor searching

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 3823 KiB  
Article
A CAD-Based Method for 3D Scanning Path Planning and Pose Control
by Jing Li, Pengfei Su, Ligang Qu, Guangming Lv and Wenhui Qian
Aerospace 2025, 12(8), 654; https://doi.org/10.3390/aerospace12080654 - 23 Jul 2025
Viewed by 202
Abstract
To address the technical bottlenecks of low path planning efficiency and insufficient point cloud coverage in the automated 3D scanning of complex structural components, this study proposes an offline method for the generation and optimization of scanning paths based on CAD models. Discrete [...] Read more.
To address the technical bottlenecks of low path planning efficiency and insufficient point cloud coverage in the automated 3D scanning of complex structural components, this study proposes an offline method for the generation and optimization of scanning paths based on CAD models. Discrete sampling of the model’s surface is achieved through the construction of an oriented bounding box (OBB) and a linear object–triangular mesh intersection algorithm, thereby obtaining a discrete point set of the model. Incorporating a standard vector analysis of the discrete points and the kinematic constraints of the scanning system, a scanner pose parameter calculation model is established. An improved nearest neighbor search algorithm is employed to generate a globally optimized scanning path, and an adaptive B-spline interpolation algorithm is applied to path smoothing. A joint MATLAB (R2023b)—RobotStudio (6.08) simulation platform is developed to facilitate the entire process, from model pre-processing and path planning to path verification. The experimental results demonstrate that compared with the traditional manual teaching methods, the proposed approach achieves a 25.4% improvement in scanning efficiency and an 18.6% increase in point cloud coverage when measuring typical complex structural components. This study offers an intelligent solution for the efficient and accurate measurement of large-scale complex parts and holds significant potential for broad engineering applications. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

21 pages, 874 KiB  
Article
Explainable Use of Foundation Models for Job Hiring
by Vishnu S. Pendyala, Neha Bais Thakur and Radhika Agarwal
Electronics 2025, 14(14), 2787; https://doi.org/10.3390/electronics14142787 - 11 Jul 2025
Viewed by 1110
Abstract
Automating candidate shortlisting is a non-trivial task that stands to benefit substantially from advances in artificial intelligence. We evaluate a suite of foundation models such as Llama 2, Llama 3, Mixtral, Gemma-2b, Gemma-7b, Phi-3 Small, Phi-3 Mini, Zephyr, and Mistral-7b for their ability [...] Read more.
Automating candidate shortlisting is a non-trivial task that stands to benefit substantially from advances in artificial intelligence. We evaluate a suite of foundation models such as Llama 2, Llama 3, Mixtral, Gemma-2b, Gemma-7b, Phi-3 Small, Phi-3 Mini, Zephyr, and Mistral-7b for their ability to predict hiring outcomes in both zero-shot and few-shot settings. Using only features extracted from applicants’ submissions, these models, on average, achieved an AUC above 0.5 in zero-shot settings. Providing a few examples similar to the job applicants based on a nearest neighbor search improved the prediction rate marginally, indicating that the models perform competently even without task-specific fine-tuning. For Phi-3 Small and Mixtral, all reported performance metrics fell within the 95% confidence interval across evaluation strategies. Model outputs were interpreted quantitatively via post hoc explainability techniques and qualitatively through prompt engineering, revealing that decisions are largely attributable to knowledge acquired during pre-training. A task-specific MLP classifier trained solely on the provided dataset only outperformed the strongest foundation model (Zephyr in 5-shot setting) by approximately 3 percentage points on accuracy, but all the foundational models outperformed the baseline model by more than 15 percentage points on f1 and recall, underscoring the competitive strength of general-purpose language models in the hiring domain. Full article
Show Figures

Figure 1

27 pages, 1630 KiB  
Article
NNG-Based Secure Approximate k-Nearest Neighbor Query for Large Language Models
by Heng Zhou, Yuchao Wang, Yi Qiao and Jin Huang
Mathematics 2025, 13(13), 2199; https://doi.org/10.3390/math13132199 - 5 Jul 2025
Viewed by 276
Abstract
Large language models (LLMs) have driven transformative progress in artificial intelligence, yet critical challenges persist in data management and privacy protection during model deployment and training. The approximate nearest neighbor (ANN) search, a core operation in LLMs, faces inherent trade-offs between efficiency and [...] Read more.
Large language models (LLMs) have driven transformative progress in artificial intelligence, yet critical challenges persist in data management and privacy protection during model deployment and training. The approximate nearest neighbor (ANN) search, a core operation in LLMs, faces inherent trade-offs between efficiency and security when implemented through conventional locality-sensitive hashing (LSH)-based secure ANN (SANN) methods, which often compromise either query accuracy due to false positives. To address these limitations, this paper proposes a novel secure ANN scheme based on nearest neighbor graph (NNG-SANN), which is designed to ensure the security of approximate k-nearest neighbor queries for vector data commonly used in LLMs. Specifically, a secure indexing structure and subset partitioning method are proposed based on LSH and NNG. The approach utilizes neighborhood information stored in the NNG to supplement subset data, significantly reducing the impact of false positive points generated by LSH on query results, thereby effectively improving query accuracy. To ensure data privacy, we incorporate a symmetric encryption algorithm that encrypts the data subsets obtained through greedy partitioning before storing them on the server, providing robust security guarantees. Furthermore, we construct a secure index table that enables complete candidate set retrieval through a single query, ensuring our solution completes the search process in one interaction while minimizing communication costs. Comprehensive experiments conducted on two datasets of different scales demonstrate that our proposed method outperforms existing state-of-the-art algorithms in terms of both query accuracy and security, effectively meeting the precision and security requirements for nearest neighbor queries in LLMs. Full article
(This article belongs to the Special Issue Privacy-Preserving Machine Learning in Large Language Models (LLMs))
Show Figures

Figure 1

21 pages, 1097 KiB  
Article
An Industry Application of Secure Augmentation and Gen-AI for Transforming Engineering Design and Manufacturing
by Dulana Rupanetti, Corissa Uberecken, Adam King, Hassan Salamy, Cheol-Hong Min and Samantha Schmidgall
Algorithms 2025, 18(7), 414; https://doi.org/10.3390/a18070414 - 4 Jul 2025
Viewed by 385
Abstract
This paper explores the integration of Large Language Models (LLMs) and secure Gen-AI technologies within engineering design and manufacturing, with a focus on improving inventory management, component selection, and recommendation workflows. The system is intended for deployment and evaluation in a real-world industrial [...] Read more.
This paper explores the integration of Large Language Models (LLMs) and secure Gen-AI technologies within engineering design and manufacturing, with a focus on improving inventory management, component selection, and recommendation workflows. The system is intended for deployment and evaluation in a real-world industrial environment. It utilizes vector embeddings, vector databases, and Approximate Nearest Neighbor (ANN) search algorithms to implement Retrieval-Augmented Generation (RAG), enabling context-aware searches for inventory items and addressing the limitations of traditional text-based methods. Built on an LLM framework enhanced by RAG, the system performs similarity-based retrieval and part recommendations while preserving data privacy through selective obfuscation using the ROT13 algorithm. In collaboration with an industry sponsor, real-world testing demonstrated strong results: 88.4% for Answer Relevance, 92.1% for Faithfulness, 80.2% for Context Recall, and 83.1% for Context Precision. These results demonstrate the system’s ability to deliver accurate and relevant responses while retrieving meaningful context and minimizing irrelevant information. Overall, the approach presents a practical and privacy-aware solution for manufacturing, bridging the gap between traditional inventory tools and modern AI capabilities and enabling more intelligent workflows in design and production processes. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

28 pages, 631 KiB  
Article
A Predictive Framework for Sustainable Human Resource Management Using tNPS-Driven Machine Learning Models
by R Kanesaraj Ramasamy, Mohana Muniandy and Parameswaran Subramanian
Sustainability 2025, 17(13), 5882; https://doi.org/10.3390/su17135882 - 26 Jun 2025
Viewed by 419
Abstract
This study proposes a predictive framework that integrates machine learning techniques with Transactional Net Promoter Score (tNPS) data to enhance sustainable Human Resource management. A synthetically generated dataset, simulating real-world employee feedback across divisions and departments, was used to classify employee performance and [...] Read more.
This study proposes a predictive framework that integrates machine learning techniques with Transactional Net Promoter Score (tNPS) data to enhance sustainable Human Resource management. A synthetically generated dataset, simulating real-world employee feedback across divisions and departments, was used to classify employee performance and engagement levels. Six machine learning models such as XGBoost, TabNet, Random Forest, Support Vector Machines, K-Nearest Neighbors, and Neural Architecture Search were applied to predict high-performing and at-risk employees. XGBoost achieved the highest accuracy and robustness across key performance metrics, including precision, recall, and F1-score. The findings demonstrate the potential of combining real-time sentiment data with predictive analytics to support proactive HR strategies. By enabling early intervention, data-driven workforce planning, and continuous performance monitoring, the proposed framework contributes to long-term employee satisfaction, talent retention, and organizational resilience, aligning with sustainable development goals in human capital management. Full article
Show Figures

Figure 1

26 pages, 4690 KiB  
Proceeding Paper
Wage Rates and Job Requirements Prediction: An Application to Logistics Online Job Postings Using Search Tools and Web Scraping
by Khoa Huu Dang Tran, Huong Quynh Nguyen, Hang My Hanh Le, Lina Doan Tran and Nhi To Yen Tran
Eng. Proc. 2025, 97(1), 32; https://doi.org/10.3390/engproc2025097032 - 17 Jun 2025
Viewed by 475
Abstract
This paper predicts offered wage rates and job requirements in the logistics industry by utilizing data from online job postings collected through two methods: search tools and web scraping. We apply conventional estimation techniques, such as ordinary least squares and kernel density estimation, [...] Read more.
This paper predicts offered wage rates and job requirements in the logistics industry by utilizing data from online job postings collected through two methods: search tools and web scraping. We apply conventional estimation techniques, such as ordinary least squares and kernel density estimation, to analyze the collected data. Additionally, for the first time, we employ nowcasting methods (linear regression, decision tree, and K-nearest neighbor methods) in this context to generate robust results. Our main findings are as follows: First, the average real wage derived from online job postings aligns with officially published GDP per capita data for the studied countries and regions. Second, we identify significantly positive causal effects of work experience on real wages in the logistics industry. Third, skill requirements exhibit year-over-year variations. Finally, the decision tree method generates the closest nowcasted results in line with the actual web scraped data. The proposed methodologies and their findings establish a reliable approach using search tools and web scraping to define and predict labor demand for stakeholders in this sector as well as others. Full article
Show Figures

Figure 1

39 pages, 2511 KiB  
Review
The Evolution of Machine Learning in Vibration and Acoustics: A Decade of Innovation (2015–2024)
by Jacek Lukasz Wilk-Jakubowski, Lukasz Pawlik, Damian Frej and Grzegorz Wilk-Jakubowski
Appl. Sci. 2025, 15(12), 6549; https://doi.org/10.3390/app15126549 - 10 Jun 2025
Cited by 1 | Viewed by 1100
Abstract
The increasing demands for the reliability of modern industrial equipment and structures necessitate advanced techniques for design, monitoring, and analysis. This review article presents the latest research advancements in the application of machine learning techniques to vibration and acoustic signal analysis from 2015 [...] Read more.
The increasing demands for the reliability of modern industrial equipment and structures necessitate advanced techniques for design, monitoring, and analysis. This review article presents the latest research advancements in the application of machine learning techniques to vibration and acoustic signal analysis from 2015 to 2024. A total of 96 peer-reviewed scientific publications were examined, selected using a systematic Scopus-based search. The main research areas include processes such as modeling and design, health management, condition monitoring, non-destructive testing, damage detection, and diagnostics. In the context of these processes, a review of machine learning techniques was conducted, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), long short-term memory (LSTM), autoencoders, support vector machines (SVMs), decision trees (DTs), nearest neighbor search (NNS), K-means clustering, and random forests. These techniques were applied across a wide range of engineering domains, including civil infrastructure, transportation systems, energy installations, and rotating machinery. Additionally, this article analyzes contributions from different countries, highlighting temporal and methodological trends in this field. The findings indicate a clear shift towards deep learning-based methods and multisensor data fusion, accompanied by increasing use of automatic feature extraction and interest in transfer learning, few-shot learning, and unsupervised approaches. This review aims to provide a comprehensive understanding of the current state and future directions of machine learning applications in vibration and acoustics, outlining the field’s evolution and identifying its key research challenges and innovation trajectories. Full article
(This article belongs to the Special Issue Machine Learning in Vibration and Acoustics 2.0)
Show Figures

Figure 1

18 pages, 368 KiB  
Article
Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features
by Bolaji A. Omodunbi, David B. Olawade, Omosigho F. Awe, Afeez A. Soladoye, Nicholas Aderinto, Saak V. Ovsepian and Stergios Boussios
Diagnostics 2025, 15(12), 1467; https://doi.org/10.3390/diagnostics15121467 - 9 Jun 2025
Viewed by 744
Abstract
Background: Parkinson’s disease (PD) is a progressive neurodegenerative condition that impairs motor and non-motor functions. Early and accurate diagnosis is critical for effective management and care. Leveraging machine learning (ML) techniques, this study aimed to develop a robust prediction system for PD using [...] Read more.
Background: Parkinson’s disease (PD) is a progressive neurodegenerative condition that impairs motor and non-motor functions. Early and accurate diagnosis is critical for effective management and care. Leveraging machine learning (ML) techniques, this study aimed to develop a robust prediction system for PD using a stacked ensemble learning approach, addressing challenges such as imbalanced datasets and feature optimization. Methods: An open-access PD dataset comprising 22 vocal attributes and 195 instances from 31 subjects was utilized. To prevent data leakage, subjects were divided into training (22 subjects) and testing (9 subjects) groups, ensuring no subject appeared in both sets. Preprocessing included data cleaning and normalization via min–max scaling. The synthetic minority oversampling technique (SMOTE) was applied exclusively to the training set to address class imbalance. Feature selection techniques—forward search, gain ratio, and Kruskal–Wallis test—were employed using subject-wise cross-validation to identify significant attributes. The developed system combined support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), and decision tree (DT) as base classifiers, with logistic regression (LR) as the meta-classifier in a stacked ensemble learning framework. Performance was evaluated using both recording-wise and subject-wise metrics to ensure clinical relevance. Results: The stacked ensemble learning model achieved realistic performance with a recording-wise accuracy of 84.7% and subject-wise accuracy of 77.8% on completely unseen subjects, outperforming individual classifiers including KNN (81.4%), RF (79.7%), and SVM (76.3%). Cross-validation within the training set showed 89.2% accuracy, with the performance difference highlighting the importance of proper validation methodology. Feature selection results showed that using the top 10 features ranked by gain ratio provided optimal balance between performance and clinical interpretability. The system’s methodological robustness was validated through rigorous subject-wise evaluation, demonstrating the critical impact of validation methodology on reported performance. Conclusions: By implementing subject-wise validation and preventing data leakage, this study demonstrates that proper validation yields substantially different (and more realistic) results compared to flawed recording-wise approaches. The findings underscore the critical importance of validation methodology in healthcare ML applications and provide a template for methodologically sound PD classification research. Future research should focus on validating the model with larger, multi-center datasets and implementing standardized validation protocols to enhance clinical applicability. Full article
(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)
Show Figures

Figure 1

24 pages, 2044 KiB  
Article
Bregman–Hausdorff Divergence: Strengthening the Connections Between Computational Geometry and Machine Learning
by Tuyen Pham, Hana Dal Poz Kouřimská and Hubert Wagner
Mach. Learn. Knowl. Extr. 2025, 7(2), 48; https://doi.org/10.3390/make7020048 - 26 May 2025
Viewed by 923
Abstract
The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on extending it to the family of Bregman divergences, which includes [...] Read more.
The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on extending it to the family of Bregman divergences, which includes the popular Kullback–Leibler divergence (also known as relative entropy). The resulting dissimilarity measure is called a Bregman–Hausdorff divergence and compares two collections of vectors—without assuming any pairing or alignment between their elements. We propose new algorithms for computing Bregman–Hausdorff divergences based on a recently developed Kd-tree data structure for nearest neighbor search with respect to Bregman divergences. The algorithms are surprisingly efficient even for large inputs with hundreds of dimensions. As a benchmark, we use the new divergence to compare two collections of probabilistic predictions produced by different machine learning models trained using the relative entropy loss. In addition to the introduction of this technical concept, we provide a survey. It outlines the basics of Bregman geometry, and motivated the Kullback–Leibler divergence using concepts from information theory. We also describe computational geometric algorithms that have been extended to this geometry, focusing on algorithms relevant for machine learning. Full article
Show Figures

Figure 1

10 pages, 322 KiB  
Proceeding Paper
Optimizing Brain Tumor Classification: Integrating Deep Learning and Machine Learning with Hyperparameter Tuning
by Vijaya Kumar Velpula, Kamireddy Rasool Reddy, K. Naga Prakash, K. Prasanthi Jasmine and Vadlamudi Jyothi Sri
Eng. Proc. 2025, 87(1), 64; https://doi.org/10.3390/engproc2025087064 - 12 May 2025
Viewed by 568
Abstract
Brain tumors significantly impact global health and pose serious challenges for accurate diagnosis due to their diverse nature and complex characteristics. Effective diagnosis and classification are essential for selecting the best treatment strategies and forecasting patient outcomes. Currently, histopathological examination of biopsy samples [...] Read more.
Brain tumors significantly impact global health and pose serious challenges for accurate diagnosis due to their diverse nature and complex characteristics. Effective diagnosis and classification are essential for selecting the best treatment strategies and forecasting patient outcomes. Currently, histopathological examination of biopsy samples is the standard method for brain tumor identification and classification. However, this method is invasive, time-consuming, and prone to human error. To address these limitations, a fully automated approach is proposed for brain tumor classification. Recent advancements in deep learning, particularly convolutional neural networks (CNNs), have shown promise in improving the accuracy and efficiency of tumor detection from magnetic resonance imaging (MRI) scans. In response, a model was developed that integrates machine learning (ML) and deep learning (DL) techniques. The process began by splitting the data into training, testing, and validation sets. Images were then resized and cropped to enhance model quality and efficiency. Relevant texture features were extracted using a modified Visual Geometry Group (VGG) architecture. These features were fed into various supervised ML models, including support vector machine (SVM), k-nearest neighbors (KNN), logistic regression (LR), stochastic gradient descent (SGD), random forest (RF), and AdaBoost, with GridSearchCV used for hyperparameter tuning. The model’s performance was evaluated using key metrics such as accuracy, precision, recall, F1-score, and specificity. Experimental results demonstrate that the proposed approach offers a robust and automated solution for brain tumor classification, achieving the highest accuracy of 94.02% with VGG19 and 96.30% with VGG16. This model can significantly assist healthcare professionals in early tumor detection and in improving diagnostic accuracy. Full article
(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

26 pages, 3740 KiB  
Article
An Improved Spider Wasp Optimizer for Green Vehicle Route Planning in Flower Collection
by Mengxin Lu and Shujuan Wang
Appl. Sci. 2025, 15(9), 4992; https://doi.org/10.3390/app15094992 - 30 Apr 2025
Cited by 1 | Viewed by 322
Abstract
Flower collection constitutes a critical segment of the flower logistics chain, and its efficiency significantly influences the industry. However, the energy consumption and carbon emissions that occur in the flower collection process present a great challenge for realizing efficient flower collection. To this [...] Read more.
Flower collection constitutes a critical segment of the flower logistics chain, and its efficiency significantly influences the industry. However, the energy consumption and carbon emissions that occur in the flower collection process present a great challenge for realizing efficient flower collection. To this end, this study proposes a green vehicle routing planning model that incorporates multiple factors, such as fixed costs, refrigeration costs, transportation costs, and so on, to minimize the total costs under hard time window constraints. Moreover, a Genetic Neighborhood Comprehensive Spider Wasp Algorithm (GN_CSWA) is proposed to find the solution to this problem. The random generation and the nearest neighbor algorithms are employed to construct the initial solution, followed by roulette selection, elite selection, and a best individual retention strategy to refine the population for the next iteration. A crossover operator is applied to facilitate global exploration, while six neighborhood search operators are applied to further enhance the quality of the solution. Moreover, to prevent the algorithm from converging to a local optimum, two mutation operators are introduced to generate new solutions. The effectiveness of the proposed optimizer is validated through extensive experimental results. Full article
Show Figures

Figure 1

30 pages, 4529 KiB  
Article
Credit Rating Model Based on Improved TabNet
by Shijie Wang and Xueyong Zhang
Mathematics 2025, 13(9), 1473; https://doi.org/10.3390/math13091473 - 30 Apr 2025
Viewed by 862
Abstract
Under the rapid evolution of financial technology, traditional credit risk management paradigms relying on expert experience and singular algorithmic architectures have proven inadequate in addressing complex decision-making demands arising from dynamically correlated multidimensional risk factors and heterogeneous data fusion. This manuscript proposes an [...] Read more.
Under the rapid evolution of financial technology, traditional credit risk management paradigms relying on expert experience and singular algorithmic architectures have proven inadequate in addressing complex decision-making demands arising from dynamically correlated multidimensional risk factors and heterogeneous data fusion. This manuscript proposes an enhanced credit rating model based on an improved TabNet framework. First, the Kaggle “Give Me Some Credit” dataset undergoes preprocessing, including data balancing and partitioning into training, testing, and validation sets. Subsequently, the model architecture is refined through the integration of a multi-head attention mechanism to extract both global and local feature representations. Bayesian optimization is then employed to accelerate hyperparameter selection and automate a parameter search for TabNet. To further enhance classification and predictive performance, a stacked ensemble learning approach is implemented: the improved TabNet serves as the feature extractor, while XGBoost (Extreme Gradient Boosting), LightGBM (Light Gradient Boosting Machine), CatBoost (Categorical Boosting), KNN (K-Nearest Neighbors), and SVM (Support Vector Machine) are selected as base learners in the first layer, with XGBoost acting as the meta-learner in the second layer. The experimental results demonstrate that the proposed TabNet-based credit rating model outperforms benchmark models across multiple metrics, including accuracy, precision, recall, F1-score, AUC (Area Under the Curve), and KS (Kolmogorov–Smirnov statistic). Full article
Show Figures

Figure 1

15 pages, 1008 KiB  
Article
BoxRF: A New Machine Learning Algorithm for Grade Estimation
by Ishmael Anafo, Rajive Ganguli and Narmandakh Sarantsatsral
Appl. Sci. 2025, 15(8), 4416; https://doi.org/10.3390/app15084416 - 17 Apr 2025
Viewed by 723
Abstract
A new machine learning algorithm, BoxRF, was developed specifically for estimating grades from drillhole datasets. The method combines the features of classical estimation methods, such as search boxes, search direction, and estimation based on inverse distance methods, with the robustness of random forest [...] Read more.
A new machine learning algorithm, BoxRF, was developed specifically for estimating grades from drillhole datasets. The method combines the features of classical estimation methods, such as search boxes, search direction, and estimation based on inverse distance methods, with the robustness of random forest (RF) methods that come from forming numerous random groups of data. The method was applied to a porphyry copper deposit, and results were compared to various ML methods, including XGBoost (XGB), k-nearest neighbors (KNN), neural nets (NN), and RF. Scikit-learn RF (SRF) performed the best (R2 = 0.696) among the ML methods but underperformed BoxRF (R2 = 0.751). The results were confirmed through a five-fold cross-validation exercise where BoxRF once again outperformed SRF. The box dimensions that performed the best were similar in length to the ranges indicated by variogram modeling, thus demonstrating a link between machine learning and traditional methods. Numerous combinations of hyperparameters performed similarly well, implying the method is robust. The inverse distance method was found to better represent the grade–space relationship in BoxRF than median values. The superiority of BoxRF over SRF in this dataset is encouraging, as it opens the possibility of improving machine learning by incorporating domain knowledge (principles of geology, in this case). Full article
Show Figures

Figure 1

22 pages, 7978 KiB  
Article
Research on High Spatiotemporal Resolution of XCO2 in Sichuan Province Based on Stacking Ensemble Learning
by Zhaofei Li, Na Zhao, Han Zhang, Yang Wei, Yumin Chen and Run Ma
Sustainability 2025, 17(8), 3433; https://doi.org/10.3390/su17083433 - 11 Apr 2025
Viewed by 443
Abstract
Global warming caused by the increase in the atmospheric CO2 content has become a focal environmental issue of common concern to the international community. As a key resource support for achieving the “dual carbon” goals in Western China, Sichuan Province requires a [...] Read more.
Global warming caused by the increase in the atmospheric CO2 content has become a focal environmental issue of common concern to the international community. As a key resource support for achieving the “dual carbon” goals in Western China, Sichuan Province requires a deep analysis of its carbon sources, carbon sinks, and its characteristics in terms of atmospheric environmental capacity, which is of great significance for formulating effective regional sustainable development strategies and responding to global climate change. In view of the unique geographical and climatic conditions in Sichuan Province and the current situation of a low and uneven distribution of atmospheric environmental capacity, this paper uses three forms of multi-source satellite data, OCO-2, OCO-3, and GOSAT, combined with other auxiliary data, to generate a daily XCO2 concentration dataset with a spatial resolution of a 1km grid in Sichuan Province from 2015 to 2022. Based on the Optuna optimization method with 10-fold cross-validation, the optimal hyperparameter configuration of the four base learners of Stacking, random forest, gradient boosting decision tree, extreme gradient boosting, and the K nearest neighbor algorithm is searched for; finally, the logistic regression algorithm is used as the second-layer meta-learner to effectively improve the prediction accuracy and generalization ability of the Stacking ensemble learning model. According to the comparison of the performance of each model by cross-validation and TCCON site verification, the Stacking model significantly improved in accuracy, with an R2, RMSE, and MAE of 0.983, 0.87 ppm and 0.19 ppm, respectively, which is better than those of traditional models such as RF, KNN, XGBoost, and GBRT. The accuracy verification of the atmospheric XCO2 data estimated by the model based on the observation data of the two TCCON stations in Xianghe and Hefei showed that the correlation coefficients were 0.96 and 0.98, and the MAEs were 0.657 ppm and 0.639 ppm, respectively, further verifying the high accuracy and reliability of the model. At the same time, the fusion of multi-source satellite data significantly improved the spatial coverage of XCO2 concentration data in Sichuan Province, effectively filling the gap in single satellite observation data. Based on the reconstructed XCO2 dataset of Sichuan Province, the study revealed that there are significant regional and seasonal differences in the XCO2 concentrations in the region, showing seasonal variation characteristics of being higher in spring and winter and lower in summer and autumn; in terms of the spatial distribution, the overall spatial distribution characteristics are high in the east and low in the west. This study helps to deepen our understanding of the carbon cycle and climate change, and can provide a scientific basis and risk assessment methods for policy formulation, effect evaluation, and international cooperation. Full article
Show Figures

Figure 1

18 pages, 3723 KiB  
Article
Ultra-Short-Term Load Forecasting for Extreme Scenarios Based on DBSCAN-RSBO-BiGRU-KNN-Attention with Fine-Tuning Strategy
by Leibao Wang, Jifeng Liang, Jiawen Li, Yonghui Sun, Hongzhu Tao, Qiang Wang and Tengkai Yu
Processes 2025, 13(4), 1161; https://doi.org/10.3390/pr13041161 - 11 Apr 2025
Viewed by 445
Abstract
Extreme scenarios involving abnormal load fluctuations pose serious challenges to the safe and stable operation of power systems. To address these challenges, an ultra-short-term load forecasting model is proposed, specifically designed for extreme conditions. The model combines density-based spatial clustering of applications with [...] Read more.
Extreme scenarios involving abnormal load fluctuations pose serious challenges to the safe and stable operation of power systems. To address these challenges, an ultra-short-term load forecasting model is proposed, specifically designed for extreme conditions. The model combines density-based spatial clustering of applications with noise (DBSCAN), random search Bayesian optimization (RSBO), bidirectional gated recurrent units (BiGRUs), k-nearest neighbor (KNN), and an attention mechanism, enhanced by a fine-tuning strategy to improve forecasting accuracy. Firstly, the original load data are reconstructed weekly, and extreme scenarios are identified using the DBSCAN. Secondly, the RSBO is employed to optimize model parameters within the high-dimensional search space. To further refine performance, the final fully connected layer is fine-tuned to adapt to extreme conditions. Finally, case studies demonstrate that the proposed approach reduces the root mean square error (RMSE) by 12.37% and the mean absolute error (MAE) by 6.73% compared to benchmark models, achieving superior accuracy under all tested extreme scenarios. Full article
Show Figures

Figure 1

Back to TopTop