MDPI - Publisher of Open Access Journals

12 pages, 996 KB

Open AccessArticle

Lack of Evidence for Well-Separated Clinical Phenotypes in Surgically Treated Infective Endocarditis Using Routine Clinical Variables: A Machine Learning Approach

by Diego Sangiorgi, Elisa Mikus, Mariafrancesca Fiorentino, Antonino Costantino, Simone Calvi, Elena Tenti, Anna Milione and Carlo Savini

Mach. Learn. Knowl. Extr. 2026, 8(6), 154; https://doi.org/10.3390/make8060154 - 4 Jun 2026

Viewed by 240

Abstract

Background: Infective endocarditis (IE) is characterized by marked heterogeneity in microbiological etiology, clinical presentation, valvular involvement, and patient complexity, which complicates risk stratification. Unsupervised machine learning has been proposed to identify latent clinical phenotypes in complex diseases; however, whether IE exhibits a natural [...] Read more.

Background: Infective endocarditis (IE) is characterized by marked heterogeneity in microbiological etiology, clinical presentation, valvular involvement, and patient complexity, which complicates risk stratification. Unsupervised machine learning has been proposed to identify latent clinical phenotypes in complex diseases; however, whether IE exhibits a natural cluster structure remains unclear. Methods: In a cohort of 739 patients undergoing surgery for IE, unsupervised clustering was performed using K-medoids based on Gower distance to account for mixed-type variables, which is a common scenario in clinical settings. The optimal number of clusters was selected by maximizing the average silhouette width and the gap statistic. Density and semi-parametric algorithms (K-prototypes, KAMILA, hierarchical clustering, and HDBSCAN) were applied as a sensitivity analysis. Differences in postoperative outcomes across clusters were explored using logistic regression. Results: K-medoids clustering identified three patient groups; however, the average silhouette width was low (0.129), indicating very weak separation between clusters. Sensitivity analysis confirmed the absence of a natural cluster structure. Despite this, a descriptive comparison of forced clusters revealed a gradient of clinical severity, with one group characterized by older age, higher comorbidity burden, complex infection features, and worse postoperative outcomes. Conclusions: Unsupervised clustering did not identify natural clinical phenotypes in surgically treated IE, likely reflecting the extreme intrinsic heterogeneity of the disease. Although forced clustering highlighted clinically interpretable gradients of risk, these groups should not be considered true latent phenotypes. Alternative approaches, such as continuous risk modeling, may be more appropriate for patient stratification in IE. Full article

(This article belongs to the Section Learning)

► Show Figures

Graphical abstract

28 pages, 4088 KB

Open AccessArticle

Research on the Flat Field Measurement Method of Coronagraph

by Yulong Feng, Xuefei Zhang, Hongfei Liang, Yu Liu, Mingzhe Sun, Tengfei Song and Mingyu Zhao

Universe 2026, 12(6), 165; https://doi.org/10.3390/universe12060165 - 3 Jun 2026

Viewed by 212

Abstract

The solar corona has an extremely low density, and its brightness is only about one millionth of that of the photosphere. High-dynamic-range imaging of its faint structure is therefore essential for studying coronal heating, coronal mass ejections, and space weather. Quantitative coronagraph imaging [...] Read more.

The solar corona has an extremely low density, and its brightness is only about one millionth of that of the photosphere. High-dynamic-range imaging of its faint structure is therefore essential for studying coronal heating, coronal mass ejections, and space weather. Quantitative coronagraph imaging requires flat-field measurement and calibration, which underpin intensity calibration, small-scale feature detection, and long-term cyclic analysis. This paper analyzes the coronagraph imaging chain (baffle–optical system–detector) and the origins of flat-field errors, including optical aberrations, stray light, and pixel-response non-uniformity, and summarizes the resulting calibration requirements of next-generation coronagraphs. On this basis, ground-based and space-based flat-fielding methods are systematically reviewed: the ground-based methods include integrating-sphere uniform light sources, opal glass/diffuser plates, clear-sky and thin-cloud backgrounds, and solar disk scanning, while the space-based methods include internal light sources and diffuser plates, attitude-roll and off-corona offset observations, and multi-phase statistical self-consistent flat-fielding. Their accuracy, resource cost, and applicability are compared. The review shows that no single method is simultaneously high-precision, easy to update, and engineer-friendly; a hierarchical, multi-method calibration framework is therefore recommended. Finally, a new method is proposed in which lithographically generated structured light fields, combined with Fourier optics and machine learning inversion, are used to estimate the pixel-response function. Preliminary experiments show that this method achieves a lower residual error than the integrating-sphere and opal glass methods, providing a high-precision reference for future wide-band, high-resolution coronagraph calibration. Full article

(This article belongs to the Section Solar and Stellar Physics)

► Show Figures

Figure 1

23 pages, 7448 KB

Open AccessArticle

Enhanced Pedotransfer Functions Through Optuna-Optimized Extreme Gradient Boosting: Application to Soil Water Retention Modeling

by Sanaz Monavvar Sabegh, Davoud Zarehaghi, Saeed Samadianfard, Mohammad Taghi Sattari and Sajjad Ahmad

Earth 2026, 7(3), 94; https://doi.org/10.3390/earth7030094 - 2 Jun 2026

Viewed by 238

Abstract

Soil water retention curves (SWRCs) are fundamental inputs for simulating vadose-zone processes, yet their direct measurement is labor-intensive and often impractical across large spatial domains. Pedotransfer functions (PTFs), therefore, provide an essential alternative for estimating SWRCs from readily measured soil properties. This study [...] Read more.

Soil water retention curves (SWRCs) are fundamental inputs for simulating vadose-zone processes, yet their direct measurement is labor-intensive and often impractical across large spatial domains. Pedotransfer functions (PTFs), therefore, provide an essential alternative for estimating SWRCs from readily measured soil properties. This study developed machine learning-based PTFs to estimate SWRCs using the UNSODA 2.0 database. An extreme gradient boosting (XGB) model was implemented and optimized using two Bayesian hyperparameter tuning frameworks, Hyperopt and Optuna, across eleven input scenarios incorporating combinations of textural, structural, and compositional soil attributes. Model performance was assessed using RMSE, R², and Kling–Gupta efficiency (KGE). To prevent data leakage from the hierarchical structure of the UNSODA 2.0 database, a nested grouped cross-validation framework was employed, ensuring an unbiased assessment of model generalization performance across independent soil samples. The Optuna-tuned XGB model trained on the full feature set achieved the highest accuracy, with a test RMSE of 0.0183, R² of 0.9815, and KGE of 0.9825, outperforming both the baseline and Hyperopt-optimized models. Feature importance and SHAP analyses indicated that soil texture dominated the estimations, while porosity, bulk density, and organic matter provided complementary improvements and particle density contributed marginally. These findings demonstrate that advanced hyperparameter optimization enhances the accuracy and interpretability of XGB-based PTFs, offering a robust framework for improved estimation of SWRCs in hydrological and soil-management applications. Full article

► Show Figures

Figure 1

24 pages, 6450 KB

Open AccessArticle

Integrated Predictive-Maintenance Framework for EV Batteries Using Short-Horizon SoH Forecasting, Degradation Warning, and Acceleration Risk Detection

by Ch. Hadassa Parimala, P. Srinivasa Varma, Ch. Paul Bakht Singh and Alagar Karthick

World Electr. Veh. J. 2026, 17(6), 286; https://doi.org/10.3390/wevj17060286 - 28 May 2026

Viewed by 266

Abstract

Precision battery-health monitoring and rapid degradation detection are essential for improving the security, durability, and efficacy of electric vehicles (EVs). By incorporating short-term State-of-Health (SoH) forecasting, mid-term deterioration alarms, and degradation acceleration risk modeling into a temporally consistent machine learning architecture, [...] Read more.

Precision battery-health monitoring and rapid degradation detection are essential for improving the security, durability, and efficacy of electric vehicles (EVs). By incorporating short-term State-of-Health (SoH) forecasting, mid-term deterioration alarms, and degradation acceleration risk modeling into a temporally consistent machine learning architecture, this research suggests a hierarchical predictive-maintenance framework. The rolling-origin cross-validation approach is implemented to maintain the chronological order of the data and prevent any potential information leaks. The predictive core employs an ensemble learning approach that integrates Random Forest, Extremely Randomized Trees, and Histogram-Based Gradient Boosting. Validation-driven model blending and training only feature selection are implemented to improve generalizability. The one-hour SoH forecasting model for short-horizon monitoring exhibits exceptional accuracy in an assessment of health prediction, with an R² of 0.9254, an RMSE of 0.0033, and a MAPE of 0.32%. Early detection of anomalies and the provision of a seven-day degradation warning may be achieved by a proactive maintenance scheduling model with an area under the curve (AUC) of 0.7838 and a recall of 0.8205. In addition, the degradation acceleration risk module could identify rapid health decline with a robustness of 0.8796 and a precision–recall AUC of 0.7101 when operating under significant stress. Reliability in critical domains is demonstrated through validation using scenarios that simulate severe temperature and stress conditions. Achieving intelligent predictive maintenance of electric vehicle battery packs is now feasible due to the proposed multi-layer ensemble structure. Full article

(This article belongs to the Section Storage Systems)

► Show Figures

Figure 1

28 pages, 7826 KB

Open AccessArticle

Nationwide Solar Radiation Zoning and Performance Comparison of Empirical and Deep Learning Models

by Bing Hui, Qian Zhang, Lei Hou, Yan Zhang, Qinghua Shi, Guoqing Chen and Junhui Wang

Appl. Sci. 2026, 16(9), 4229; https://doi.org/10.3390/app16094229 - 26 Apr 2026

Viewed by 276

Abstract

Accurate solar radiation estimation is critical for optimizing solar energy applications. This study divided 819 meteorological stations in China into six solar radiation zones using k-means, hierarchical, and bisecting k-means clustering based on daily relative sunshine duration. Correlation analysis and feature importance evaluation [...] Read more.

Accurate solar radiation estimation is critical for optimizing solar energy applications. This study divided 819 meteorological stations in China into six solar radiation zones using k-means, hierarchical, and bisecting k-means clustering based on daily relative sunshine duration. Correlation analysis and feature importance evaluation were conducted to quantify the contributions of key meteorological variables. A comparison of models considering regional heterogeneity was performed. Six sunshine-based empirical models, three machine learning models (Random Forest, Support Vector Machine, and Extreme Gradient Boosting), and two deep learning models (Long Short-Term Memory and Gated Recurrent Unit) were systematically evaluated across 98 stations with observed solar radiation data. Model performance was assessed using the coefficient of determination (R²), mean absolute error (MAE), root mean square error (RMSE), and normalized RMSE (NRMSE). Results showed that k-means clustering outperformed the other two methods and was adopted for final zoning. The correlation analysis identified sunshine duration (S), extraterrestrial radiation (R_a), temperature difference (ΔT), and maximum temperature (T_max) as the dominant influencing factors, with clear regional heterogeneity. The deep learning models, particularly LSTM (R² = 0.939, RMSE = 1.702 MJ/m/²/d¹, MAE = 1.319 MJ/m/²/d¹, NRMSE = 0.046), achieved the highest accuracy, followed by GRU, XGB, SVM, and RF. Among the empirical models, Model 5 performed best in Zones 1, 3, 4, and 5, while Model 6 was optimal in Zones 2 and 6. The key novelty of the study is an integrated zoning–prediction framework for regional solar radiation estimation, combining clustering validation, correlation analysis, empirical model calibration, and deep learning benchmarking, with enhanced physical interpretability and prediction accuracy. Full article

► Show Figures

Figure 1

27 pages, 5739 KB

Open AccessArticle

Baseline-Conditioned Spatial Heterogeneity in Ensemble-Learning Correction for Global Hourly Sea-Level Reconstruction

by Yu Hao, Yixuan Tang, Wen Du, Yang Li and Min Xu

J. Mar. Sci. Eng. 2026, 14(8), 697; https://doi.org/10.3390/jmse14080697 - 8 Apr 2026

Viewed by 647

Abstract

This study examines how assessments of coastal extreme sea levels depend on the separability and reconstructability of the astronomical tide in hourly sea-level records. Using a global tide-gauge network, it proposes an ensemble-learning correction framework that integrates a physical-baseline threshold with multi-criteria consistency [...] Read more.

This study examines how assessments of coastal extreme sea levels depend on the separability and reconstructability of the astronomical tide in hourly sea-level records. Using a global tide-gauge network, it proposes an ensemble-learning correction framework that integrates a physical-baseline threshold with multi-criteria consistency testing to determine whether machine-learning enhancement is genuinely effective across stations and time windows. The analysis uses hourly records from 528 UHSLC tide gauges, with 31-day short sequences used to reconstruct 180-day sea-level variability. Taking the physical tidal model as the baseline, residuals are corrected using Extremely Randomized Trees, Random Forest, and Gradient Boosting. To avoid false improvement driven solely by error reduction, a hierarchical decision framework is established. Baseline model quality is first screened using NSE and the coefficient of determination, after which mathematical artefacts are identified through diagnostics of peak suppression and variance shrinkage. A five-level classification is then derived from the convergent evidence of twelve performance metrics and four statistical significance tests. The results show a consistent global pattern across all three algorithms. Approximately 57% of stations meet the criterion for genuine improvement, whereas about 42% are associated with an unreliable physical baseline, indicating that the dominant source of failure arises not from the ensemble-learning algorithms themselves, but from spatially varying limitations in the underlying physical baseline. Spatially, the credibility of machine-learning correction is strongly conditioned by baseline quality: stations with effective correction are more continuous along the eastern North Atlantic and European coasts, whereas stations with ineffective correction are more concentrated in the Gulf of Mexico, the Caribbean, and the marginal seas and archipelagic regions of the western Pacific. These results indicate that the observed spatial heterogeneity primarily reflects geographically varying physical and dynamical conditions that control baseline reliability and residual learnability, rather than a standalone difference in the intrinsic capability of ensemble learning itself. Full article

(This article belongs to the Special Issue AI-Enhanced Dynamics and Reliability Analysis of Marine Structures)

► Show Figures

Figure 1

39 pages, 8897 KB

Open AccessArticle

Research on Improved Transformer Fault Diagnosis Method Driven by IBKA-VMD and Hierarchical Fractional Order Attention Entropy Synergy

by Jingzong Yang, Xuefeng Li and Min Mao

Fractal Fract. 2026, 10(3), 195; https://doi.org/10.3390/fractalfract10030195 - 16 Mar 2026

Cited by 1 | Viewed by 519

Abstract

Rolling bearing faults are the primary cause of rotating machinery failure. Under complex operating conditions, the weak fault impact signals are easily overwhelmed by strong noise and exhibit significant non-stationary characteristics, posing severe challenges to accurate diagnosis. To address this, this paper proposes [...] Read more.

Rolling bearing faults are the primary cause of rotating machinery failure. Under complex operating conditions, the weak fault impact signals are easily overwhelmed by strong noise and exhibit significant non-stationary characteristics, posing severe challenges to accurate diagnosis. To address this, this paper proposes an improved Transformer-based fault diagnosis method driven by the improved black-winged kite algorithm-variational mode decomposition (IBKA-VMD) and hierarchical fractional-order attention entropy (HFrAttE). The method employs the integrated multi-strategy IBKA to adaptively determine the optimal parameters of VMD, utilizes HFrAttE to construct highly discriminative feature sets, and further builds an improved Transformer model integrating bidirectional attention mechanisms and feature decoupling structures for deep feature mining. The classification decision is finalized by the twin extreme learning machine (TELM). Experimental results on the case western reserve university (CWRU) bearing dataset under different noise environments (−2 dB, −5 dB) demonstrate that the proposed method maintains 100% accuracy, recall, and F1-score under −5 dB noise interference, significantly outperforming comparative models. It exhibits excellent anti-noise performance and feature extraction capability, providing an efficient solution for intelligent operation and maintenance of rotating machinery under complex operating conditions. Full article

(This article belongs to the Section Engineering)

► Show Figures

Figure 1

25 pages, 4245 KB

Open AccessArticle

Comprehensive Early Alert and Adaptive Local Response Framework for Wildfire Risk in Transmission Line Corridors Using Coupled Global Factors in Power System

by Tianliang Xue, Chengsi Xiang, Xi Chen and Lei Zhang

Processes 2026, 14(5), 752; https://doi.org/10.3390/pr14050752 - 25 Feb 2026

Viewed by 397

Abstract

Escalating global climate change has intensified the frequency and scale of wildfires in mountainous regions hosting transmission line infrastructure. These conflagrations act as extreme meteorological events, capable of generating localized heatwaves that compromise the air insulation of power lines and trigger protective relay [...] Read more.

Escalating global climate change has intensified the frequency and scale of wildfires in mountainous regions hosting transmission line infrastructure. These conflagrations act as extreme meteorological events, capable of generating localized heatwaves that compromise the air insulation of power lines and trigger protective relay operations, thereby posing systemic threats to regional grid stability. To enhance wildfire early-warning efficacy for grid security, this study formulates wildfire early warning for power transmission corridors as a regression-based risk prediction problem and proposes a hierarchical “global screening–local refinement” risk assessment framework. The primary contribution of this study lies in the integration of a machine-learning-based global wildfire risk screening model with tower-level spatial refinement using geographically weighted regression (GWR), enabling coordinated global–local wildfire risk characterization along power transmission corridors The framework employs a predictive model built on a Gradient Boosting Decision Tree algorithm, integrating geospatial and statistical analyses. A global risk model, utilizing historical data from the Himawari-8 satellite alongside meteorological, topographic, and anthropogenic variables, produces a composite risk index. This index is spatially interpolated via Kriging to generate stratified wildfire risk maps for broad-area assessment. For precise corridor-level analysis, these Globally Projected Risk Indices, along with localized terrain features, inter-tower clearance distances, and proximity to historical ignition points, are incorporated into a Geographically Weighted Regression model. This yields a spatially calibrated wildfire risk index along critical routes. The results show that the GBDT-based model achieved the best predictive performance among the evaluated regression models, with an R² of 0.626 and a mean squared error of 0.178. This approach offers a scientifically robust and operationally viable reference for wildfire prevention strategies in power line maintenance. Full article

(This article belongs to the Special Issue AI-Driven Innovations for Enhancing Power System Stability and Operational Efficiency)

► Show Figures

Figure 1

20 pages, 580 KB

Open AccessArticle

A Maturation-Aware Machine Learning Framework for Screening the Nutritional Status of Adolescents

by Hatem Ghouili, Zouhaier Farhani, Narimen Yousfi, Halil İbrahim Ceylan, Amel Dridi, Andrea de Giorgio, Nicola Luigi Bragazzi, Noomen Guelmami, Ismail Dergaa and Anissa Bouassida

Nutrients 2026, 18(4), 660; https://doi.org/10.3390/nu18040660 - 17 Feb 2026

Cited by 1 | Viewed by 907

Abstract

Background: Malnutrition in adolescents remains a significant public health issue worldwide, with undernutrition and overweight often coexisting. Accurate nutritional screening during adolescence is complicated by variability in biological maturation and class imbalance, particularly among underweight adolescents. Objective: This study aims to develop and [...] Read more.

Background: Malnutrition in adolescents remains a significant public health issue worldwide, with undernutrition and overweight often coexisting. Accurate nutritional screening during adolescence is complicated by variability in biological maturation and class imbalance, particularly among underweight adolescents. Objective: This study aims to develop and validate machine learning models for classifying the nutritional status of adolescents, accounting for class imbalance and biological maturation, and to evaluate model stability and variable importance at different stages of peak height velocity (PHV). Methods: In this cross-sectional study, 4232 adolescents aged 11 to 18 years were recruited from nine educational institutions in Tunisia. Their nutritional status was classified according to the International Obesity Task Force (IOTF) BMI thresholds into three categories: underweight (14.4%), normal weight (68.3%), and overweight (17.2%). Ten anthropometric, behavioral, and maturation-related predictors were analyzed. Six supervised machine learning algorithms were evaluated using a 70/30 stratified split between training and test sets, with five-fold cross-validation. Class imbalance was addressed by ROSE combined with cost-sensitive learning. Model performance was assessed using accuracy, Cohen’s kappa coefficient, macro F1 score, sensitivity, specificity, and AUC. Results: The cost-sensitive Random Forest (RF) model achieved the best overall performance, with an accuracy of 0.830, a macro F1 score of 0.767, a macro-AUC of 0.921, and a macro- sensitivity of 0.743. The class-specific sensitivities were 0.70 (underweight), 0.91 (normal weight), and 0.62 (overweight), with no major misclassification between the extreme categories. Performance remained stable across the different maturation phases (accuracy from 0.823 to 0.839), with optimal discrimination in the pre-PHV (macro-AUC = 0.936; sensitivity for underweight = 0.82) and post-PHV (macro-AUC = 0.931) periods. Body mass was the main predictor (importance = 1.00), followed by waist circumference (0.34–0.53). The importance of age for classifying underweight increased significantly from the pre-PHV (0.10) to the post-PHV (0.75) period. A two-stage hierarchical model further improved underweight detection (stage 1 AUC = 0.911; sensitivity = 0.732). Conclusions: A cost-sensitive RF model, combined with ROSE, provides robust classification of adolescents’ nutritional status maturation, significantly improving underweight detection while preserving overall accuracy. This approach is particularly well-suited to public health screening in schools as a first-stage assessment that requires clinical confirmation and promotes a maturation-aware interpretation of nutritional risk among adolescents. Full article

(This article belongs to the Special Issue Nutrition-Based Counseling and Interventions for Chronic Disease Prevention)

► Show Figures

Graphical abstract

27 pages, 2135 KB

Open AccessArticle

Optimization of Farmland Cultivated Land Path Based on Hybrid Adaptive Neighborhood Search Algorithm

by Han Lv, Zhixin Yao and Taihong Zhang

Sensors 2026, 26(4), 1202; https://doi.org/10.3390/s26041202 - 12 Feb 2026

Viewed by 587

Abstract

Path planning for large-scale agricultural fields faces challenges such as irregular field shapes, uncertain boundaries, and the need to balance path efficiency, energy consumption, and coverage quality. To address these problems, this research introduces a strategy-aware hierarchical hybrid optimization framework (HANS) for autonomous [...] Read more.

Path planning for large-scale agricultural fields faces challenges such as irregular field shapes, uncertain boundaries, and the need to balance path efficiency, energy consumption, and coverage quality. To address these problems, this research introduces a strategy-aware hierarchical hybrid optimization framework (HANS) for autonomous agricultural operations. This framework introduces a global principal axis extraction method based on Principal Component Analysis (PCA), utilizing the statistical distribution of field boundaries to guide path direction, thereby improving robustness against boundary noise and irregular geometries. The framework integrates Adaptive Large Neighborhood Search (ALNS) for global exploration and Tabu Search (TS) for local optimization, forming a tightly coordinated hybrid structure. The framework further employs a Pareto-set-based decision support selection strategy to solve a multi-objective optimization model encompassing machine kinematics, turning patterns, and energy-aware cost evaluation. This strategy provides three methods: weighted preference-based compromise solution selection, crowding distance-based diversified solution selection, and single-objective extreme value-based dedicated optimization solution selection. To balance the impact of path length, energy consumption, and coverage rate, we assigned equal or nearly equal weights to them (i.e., (0.33, 0.33, 0.34)). Furthermore, the framework incorporates operators and feedback learning mechanisms specific to agricultural coverage path problems to enable adaptive operator selection and reduce reliance on manual parameter tuning. Simulation results under three representative field scenarios show that compared to fixed-direction planning, HANS improves the average coverage rate by 0.51 percentage points and reduces fuel consumption by 4.34%. Compared to Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Tabu Search (TS), and Simulated Annealing (SA), the proposed method shortens the working path length by 0.37–0.83%, improves coverage rate by 0.34–1.11%, and reduces energy consumption by 0.61–1.03%, while maintaining competitive computational costs. These results demonstrate the effectiveness and practicality of HANS in large-scale autonomous farming operations. Full article

(This article belongs to the Special Issue Robotic Systems for Future Farming)

► Show Figures

Figure 1

15 pages, 2981 KB

Open AccessArticle

Capacity-Limited Failure in Approximate Nearest Neighbor Search on Image Embedding Spaces

by Morgan Roy Cooper and Mike Busch

J. Imaging 2026, 12(2), 55; https://doi.org/10.3390/jimaging12020055 - 25 Jan 2026

Viewed by 1050

Abstract

Similarity search on image embeddings is a common practice for image retrieval in machine learning and pattern recognition systems. Approximate nearest neighbor (ANN) methods enable scalable similarity search on large datasets, often approaching sub-linear complexity. Yet, little empirical work has examined how ANN [...] Read more.

Similarity search on image embeddings is a common practice for image retrieval in machine learning and pattern recognition systems. Approximate nearest neighbor (ANN) methods enable scalable similarity search on large datasets, often approaching sub-linear complexity. Yet, little empirical work has examined how ANN neighborhood geometry differs from that of exact k-nearest neighbors (k-NN) search as the neighborhood size increases under constrained search effort. This study quantifies how approximate neighborhood structure changes relative to exact k-NN search as k increases across three experimental conditions. Using multiple random subsets of 10,000 images drawn from the STL-10 dataset, we compute ResNet-50 image embeddings, perform an exact k-NN search, and compare it to a Hierarchical Navigable Small World (HNSW)-based ANN search under controlled hyperparameter regimes. We evaluated the fidelity of neighborhood structure using neighborhood overlap, average neighbor distance, normalized barycenter shift, and local intrinsic dimensionality (LID). Results show that exact k-NN and ANN search behave nearly identically when

e f S e a r c h > k

. However, as the neighborhood size grows and

e f S e a r c h

remains fixed, ANN search fails abruptly, exhibiting extreme divergence in neighbor distances at approximately

k \approx 2

–

3.5 \times e f S e a r c h

. Increasing index construction quality delays this failure, and scaling search effort proportionally with neighborhood size (

e f S e a r c h = α \times k

with

α \geq 1

) preserves neighborhood geometry across all evaluated metrics, including LID. The findings indicate that ANN search preserves neighborhood geometry within its operational capacity but abruptly fails when this capacity is exceeded. Documenting this behavior is relevant for scientific applications that approximate embedding spaces and provides practical guidance on when ANN search is interchangeable with exact k-NN and when geometric differences become nontrivial. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

24 pages, 2735 KB

Open AccessArticle

Hierarchical Data Fusion Algorithm for Multiple Wind Speed Sensors in Anemometer Tower

by Junhong Duan, Hailong Zhang, Chao Tu, Jun Song, Wei Niu, Zhen Zhang, Jinze Han and Jiuyuan Huo

Sensors 2026, 26(2), 565; https://doi.org/10.3390/s26020565 - 14 Jan 2026

Viewed by 547

Abstract

Accurate and reliable wind speed measurement is essential for applications such as wind power generation and meteorological monitoring. Data fusion from multiple anemometers mounted on wind measurement towers is a key approach to obtaining high-precision wind speed information. In this study, a hierarchical [...] Read more.

Accurate and reliable wind speed measurement is essential for applications such as wind power generation and meteorological monitoring. Data fusion from multiple anemometers mounted on wind measurement towers is a key approach to obtaining high-precision wind speed information. In this study, a hierarchical data fusion strategy is proposed to enhance both the quality and efficiency of multi-sensor fusion on wind measurement towers. At the local fusion stage, multi-sensor wind speed data are denoised and fused using an unscented Kalman filter enhanced with fuzzy logic and a robustness factor (FLR-UKF). At the global decision fusion stage, decision-level fusion is achieved through an extreme learning machine (ELM) neural network optimized by a Q-learning-improved Aquila optimizer (QLIAO-ELM). By incorporating a spiral surrounding attack mechanism and a Q-learning-based adaptive strategy, QLIAO-ELM significantly enhances global search capability and convergence speed, enabling the ELM network to obtain superior parameters within limited computational time. Consequently, the accuracy and efficiency of decision fusion are improved. Experimental results show that, during the local fusion phase, the RMSE of FLR-UKF is reduced by 26.46% to 28.6% compared to the traditional UKF; during the global fusion phase, the RMSE of QLIAO-ELM is reduced by 27.1% and 14.0% compared to ELM and ISSA-ELM, respectively. Full article

(This article belongs to the Special Issue Sensor Fusion: Kalman Filtering for Engineering Applications)

► Show Figures

Figure 1

32 pages, 2805 KB

Open AccessArticle

Geologically Constrained Multi-Scale Transformer for Lithology Identification Under Extreme Class Imbalance

by Xiao Li, Puhong Feng, Baohua Yu, Chun-Ping Li, Junbo Liu and Jie Zhao

Eng 2026, 7(1), 8; https://doi.org/10.3390/eng7010008 - 25 Dec 2025

Cited by 1 | Viewed by 907

Abstract

Accurate identification of lithology is considered very important in oil and gas exploration because it has a direct impact on the evaluation and development planning of any reservoir. In complex reservoirs where extreme class imbalance occurs, as critical minority lithologies cover less than [...] Read more.

Accurate identification of lithology is considered very important in oil and gas exploration because it has a direct impact on the evaluation and development planning of any reservoir. In complex reservoirs where extreme class imbalance occurs, as critical minority lithologies cover less than 5%, the identification accuracy is severely constrained. Recent deep learning methods include convolutional neural networks, recurrent architectures, and transformer-based models that have achieved substantial improvements over traditional machine learning approaches in identifying lithology. These methods demonstrate great performance in catching spatial patterns and sequential dependencies from well log data, and they show great recognition accuracy, up to 85–88%, in the case of a moderate imbalance scenario. However, when these methods are extended to complex reservoirs under extreme class imbalance, the following three major limitations have been identified: (1) single-scale architectures, such as CNNs or standard Transformers, cannot capture thin-layer details less than 0.5 m and regional geological trends larger than 2 m simultaneously; (2) generic imbalance handling techniques, including focal loss alone or basic SMOTE, prove to be insufficient for extreme ratios larger than 50:1; and (3) conventional Transformers lack depth-dependent attention mechanisms incorporating stratigraphic continuity principles. This paper is dedicated to proposing a geological-constrained multi-scale Transformer framework tailored for 1D well-log sequences under extreme imbalance larger than 50:1. The systematic approach addresses the extreme imbalance by deep-feature fusion and advanced class-rebalancing strategies. Accordingly, this framework integrates multi-scale convolutional feature extraction using 1 × 3, 1 × 5, 1 × 7 kernels, hierarchical attention mechanisms with depth-aware position encoding based on Walther’s Law to model long-range dependencies, and adaptive three-stage class-rebalancing through SMOTE–Tomek hybrid resampling, focal loss, and CReST self-training. The experimental validation based on 32,847 logging samples demonstrates significant improvements: overall accuracy reaches 90.3% with minority class F1 scores improving by 20–25% percentage points (argillaceous siltstone 73.5%, calcareous sandstone 68.2%, coal seams 65.8%), and G-mean of 0.804 confirming the balanced recognition. Of note, the framework maintains stable performance even when there is extreme class imbalance at a ratio of up to 100:1 with minority class F1 scores above 64%, representing a two-fold improvement over the state-of-the-art methods, where former Transformer-based approaches degrade below. This paper provides the fundamental technical development for the intelligent transformation of oil and gas exploration, with extensive application prospects. Full article

(This article belongs to the Section Chemical, Civil and Environmental Engineering)

► Show Figures

Figure 1

38 pages, 8524 KB

Open AccessArticle

Prediction of Compressive Strength of Carbon Nanotube Reinforced Concrete Based on Multi-Dimensional Database

by Ao Yan, Shengdong Zhang, Zhuoxuan Li, Peng Zhu and Yuching Wu

Buildings 2025, 15(23), 4349; https://doi.org/10.3390/buildings15234349 - 1 Dec 2025

Cited by 2 | Viewed by 948

Abstract

The incorporation of carbon nanotubes (CNTs) enhances the mechanical properties of cement-based materials by inhibiting micro-crack propagation. Machine learning provides an efficient approach for predicting the compressive strength of CNT-reinforced concrete, yet existing studies often lack important features and rely on less adaptive [...] Read more.

The incorporation of carbon nanotubes (CNTs) enhances the mechanical properties of cement-based materials by inhibiting micro-crack propagation. Machine learning provides an efficient approach for predicting the compressive strength of CNT-reinforced concrete, yet existing studies often lack important features and rely on less adaptive models. To address these issues, a multi-dimensional database (429 experimental data points) covering 11 factors (including cement mix ratio, CNT morphology, and dispersion process) was constructed. A hierarchical model verification and optimization was conducted: traditional regression models (Multiple Linear Regression, Multiple Polynomial Regression (MPR), Multivariate Adaptive Regression Splines), mainstream model (Support Vector Regression (SVR)), and ensemble learning models (Random Forest, eXtreme Gradient Boosting (XGB), Light Gradient Boosting Machine optimized by Particle Swarm Optimization (PSO)/Bayesian Optimization (BO)) are trained, compared, and evaluated. MPR performs best (test set R² = 0.856) among traditional regression models, while SVR (test set R² = 0.824) is less accurate. The highest accuracy in ensemble models is achieved by the PSO-optimized XGB model, with R² = 0.910 (test set). PSO outperforms BO in optimization precision, while BO is much more efficient. Water–cement ratio, age, and sand–cement ratio are the primary influencing factors for strength. Among CNT parameters, the inner diameter has greater impact than the length and outer diameter. Optimal CNT parameters are CNT–cement mass ratio 0.1–0.3%, inner diameter ≥ 7.132 nm, and length 1–15 μm. Surfactant polycarboxylate can increase strength, while OH⁻ functional groups can decrease it. These findings, integrated into the high-precision PSO-XGB model, provide a powerful tool for optimizing the mix design of CNT-reinforced concrete, accelerating its development and application in the industry. Full article

(This article belongs to the Section Building Materials, and Repair & Renovation)

► Show Figures

Figure 1

34 pages, 466 KB

Open AccessArticle

biLorentzFM: Hyperbolic Multi-Objective Deep Learning for Reciprocal Recommendation

by Kübra Karacan Uyar and Yücel Batu Salman

Appl. Sci. 2025, 15(22), 12340; https://doi.org/10.3390/app152212340 - 20 Nov 2025

Cited by 1 | Viewed by 1424

Abstract

Reciprocal recommendation requires satisfying preferences on both sides of a match, which differs from standard one-sided settings and often involves hierarchical structure (e.g., skills, seniority, education). We present biLorentzFM, which is a multi-objective framework that integrates hyperbolic geometry into factorization machine architectures using [...] Read more.

Reciprocal recommendation requires satisfying preferences on both sides of a match, which differs from standard one-sided settings and often involves hierarchical structure (e.g., skills, seniority, education). We present biLorentzFM, which is a multi-objective framework that integrates hyperbolic geometry into factorization machine architectures using Lorentz embeddings with learnable curvature and manifold-aware optimization. The approach addresses whether a geometric structure aligned with hierarchical relationships can improve reciprocal matching without requiring major architectural changes. On a large-scale recruitment dataset from Kariyer.Net (1,150,302 interactions, 229,805 candidates), the model achieves candidate and company AUCs of 0.9964 and 0.9913 respectively, representing 6.6% and 6.0% improvements over the strongest Euclidean baseline while maintaining practical inference latency (2.1 ms per batch). Cross-validation analysis confirms robustness (5-fold: 0.9813 ± 0.0002; 3-seed: 0.9964 ± 0.0012) with very large effect sizes (Cohen’s d = 2.89–3.08). Although the per-epoch training time increases by 23.5% due to manifold operations, faster convergence (12 vs. 18 epochs) reduces the total training time by 17.8%. Cross-domain evaluation on Speed Dating data demonstrates generalization beyond explicit hierarchies with a 2.8% AUC improvement despite lacking structured taxonomies. Learned curvature parameters differ by entity type, providing interpretable indicators of hierarchical structure strength. Ablation studies isolate contributions from geometric structure (6.6%), learnable curvature (4.7%), multi-objective learning (2.1%), and explicit feature interactions (0.6%). A systematic comparison reveals that Lorentz embeddings outperform Poincaré ball implementations by 4.4% AUC under identical conditions, which is attributed to numerical stability advantages. The results indicate that pairing standard recommendation architectures with geometry reflecting hierarchical relationships can provide consistent improvements for reciprocal matching, while limitations including cold-start performance, computational overhead at an extreme scale, and static hierarchy assumptions suggest directions for future work on adaptive curvature, fairness constraints, and dynamic taxonomies. Full article

► Show Figures

Figure 1

Search Results (64)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (64)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI