MDPI - Publisher of Open Access Journals

37 pages, 28225 KB

Open AccessArticle

Hierarchical Spectral Modelling of Pasture Nutrition: From Laboratory to Sentinel-2 via UAV Hyperspectral

by Jason Barnetson, Hemant Raj Pandeya and Grant Fraser

AgriEngineering 2026, 8(4), 143; https://doi.org/10.3390/agriengineering8040143 - 7 Apr 2026

Viewed by 206

This study demonstrates a hierarchical spectral modelling approach for predicting pasture nutrition metrics using TabPFN (Tabular Prior-Data Fitted Network), a transformer-based machine learning architecture. In the face of climate variability, aligning stocking rates with pasture resources is crucial for sustainable livestock grazing, requiring [...] Read more.

This study demonstrates a hierarchical spectral modelling approach for predicting pasture nutrition metrics using TabPFN (Tabular Prior-Data Fitted Network), a transformer-based machine learning architecture. In the face of climate variability, aligning stocking rates with pasture resources is crucial for sustainable livestock grazing, requiring accurate assessments of both pasture biomass and nutrient composition. Our research, conducted across diverse growth stages at five tropical and subtropical savanna rangeland properties in Queensland, Australia, with native and introduced C4 grasses, employed a hierarchical sampling and modelling strategy that scales from laboratory spectroscopy to Sentinel-2 satellite predictions via uncrewed aerial vehicle (UAV) hyperspectral imaging. Spectral data were collected from leaf (laboratory spectroscopy) through field (point measurements), UAV hyperspectral imaging, and Sentinel-2 satellite imagery. Traditional laboratory wet chemistry methods determined plant leaf and stem nutrient content, from which crude protein (CP = total nitrogen (TN) × 6.25) and dry matter digestibility (DMD = 88.9–0.779 × acid detergent fibre (ADF)) were derived. TabPFN models were trained at each spatial scale, achieving validation

R^{2}

of 0.76 for crude protein at the leaf scale, 0.95 at the UAV scale, and 0.92 at the Sentinel-2 satellite scale. For dry matter digestibility, validation

R^{2}

was 0.88 at the UAV scale and 0.73 at the Sentinel-2 scale. A pasture classification masking approach using a deep neural network with 98.6% accuracy (7 classes) was implemented to focus predictions on productive pasture areas, excluding bare soil and woody vegetation. The Sentinel-2 models were trained on 462 samples from 19 site–date combinations across 11 field sites. The TabPFN architecture provided notable advantages over traditional neural networks: no hyperparameter tuning required, faster training, and superior generalisation from limited training samples. These results demonstrate the potential for accurate and efficient prediction and mapping of pasture quality across large areas (100 s–1000 s

{k m}^{2}

) using freely available satellite imagery and open-source machine learning frameworks. Full article

(This article belongs to the Special Issue The Application of Remote Sensing for Agricultural Monitoring)

► Show Figures

Figure 1

18 pages, 10397 KB

Open AccessArticle

Multiple Imputation of a Continuous Outcome with Fully Observed Predictors Using TabPFN

by Jerome Sepin

Stats 2026, 9(2), 38; https://doi.org/10.3390/stats9020038 - 1 Apr 2026

Viewed by 257

Abstract

Handling missing data is a central challenge in quantitative research, particularly when datasets exhibit complex dependency structures, such as nonlinear relationships and interactions. Multiple imputation (MI) via fully conditional specification (FCS), as implemented in the MICE R package, is widely used but relies [...] Read more.

Handling missing data is a central challenge in quantitative research, particularly when datasets exhibit complex dependency structures, such as nonlinear relationships and interactions. Multiple imputation (MI) via fully conditional specification (FCS), as implemented in the MICE R package, is widely used but relies on user-specified models that may fail to capture complex dependency structures, especially in high-dimensional settings, or on more sophisticated algorithms that are considered data-hungry. This paper investigates the performance of TabPFN, a transformer-based, pretrained foundation model developed for tabular prediction tasks, for MI. TabPFN is pretrained on millions of synthetic datasets and approximates posterior predictive distributions without dataset-specific retraining, offering a compelling solution for imputing complex missing data in small to moderately sized samples. We conduct a simulation study focusing on univariate missingness in a continuous outcome with complete predictors, comparing TabPFN with standard MI methods. Performance is evaluated using bias, standard error, and coverage of the marginal mean estimand across a range of data-generating and missingness mechanisms. Our results show that TabPFN yields competitive or superior performance relative to Classification and Regression Trees and Predictive Mean Matching. These findings highlight TabPFN as a promising tool for missing data imputation, with particular relevance to health research. Full article

(This article belongs to the Special Issue Statistical Methods for Hypothesis Testing)

► Show Figures

Figure 1

34 pages, 5296 KB

Open AccessArticle

An Interpretable Pretrained Tabular Modeling Framework for Predicting IRI Across Multiple Pavement Structural Configurations

by Liang Qin, Tong Liu, Qianhui Sun and Mingxin Tang

Buildings 2026, 16(7), 1358; https://doi.org/10.3390/buildings16071358 - 29 Mar 2026

Viewed by 371

Abstract

With increasing traffic loads and increasingly complex climate conditions, accurate prediction of the International Roughness Index (IRI) of asphalt pavements is crucial for developing effective maintenance plans. However, traditional regression models have limitations in capturing the coupled effects of traffic, structure, and environmental [...] Read more.

With increasing traffic loads and increasingly complex climate conditions, accurate prediction of the International Roughness Index (IRI) of asphalt pavements is crucial for developing effective maintenance plans. However, traditional regression models have limitations in capturing the coupled effects of traffic, structure, and environmental factors. To overcome this limitation, this study constructed a dataset containing 10,836 samples based on the Long-Term Pavement Performance (LTPP) database, integrating traffic load, pavement structure parameters, and climate variables. The variance inflation factor (VIF) and correlation analysis were used to validate the effectiveness of feature selection. We trained nine machine learning models and optimized the hyperparameters using a Bayesian optimization method with five-fold cross-validation to ensure good generalization ability. Results show that the TabPFN model, based on prior information, achieved the best overall performance with a coefficient of determination

R^{2}

= 0.9474 and a low prediction error (RMSE = 0.138) on the test set. Paired t-tests based on cross-validation further confirmed that TabPFN’s predictive performance is statistically superior to the baseline model. SHAP and generalized additive model (GAM) analyses indicate that traffic load is the main driver of IRI growth, while structural layer thickness, within a certain range, can mitigate pavement roughness. Climatic factors have indirect long-term effects through cumulative environmental exposure. Although the main drivers differ slightly among different pavement structures, traffic load consistently plays a dominant role. To enhance the model’s practical applicability, we also developed a user-friendly graphical interface (GUI) for fast and accurate IRI prediction. Full article

(This article belongs to the Special Issue From Theory to Practice: Artificial Intelligence Applications in the Built Environment)

► Show Figures

Figure 1

28 pages, 6373 KB

Open AccessArticle

Mitigating Urban-Centric Bias to Address the Rural Eligibility Discovery Lag

by Guiyan Jiang and Donghui Zhang

Land 2026, 15(4), 535; https://doi.org/10.3390/land15040535 - 25 Mar 2026

Viewed by 361

Abstract

Urban sustainability depends on rural hinterlands, yet national-scale evaluation and AI screening often rely on urban-centric proxies, which can under-recognize remote villages where the evidence base is sparse. Using China’s national honored-village programme (N = 24,450) as a case, we examine how recognition [...] Read more.

Urban sustainability depends on rural hinterlands, yet national-scale evaluation and AI screening often rely on urban-centric proxies, which can under-recognize remote villages where the evidence base is sparse. Using China’s national honored-village programme (N = 24,450) as a case, we examine how recognition patterns change when data availability and observability are unequal across regions, with a focus on the Qinghai–Tibetan Plateau (QTP), where 923 honored villages account for only 3.78% of the national total. We interpret urban-centric proxy reliance as the tendency for recognition patterns to correlate with urban-linked observability signals (e.g., nighttime lights). In this study, discovery lag refers to situations where villages exhibit characteristics similar to historically recognized villages but remain unrecognized under the current honor regime due to uneven data availability and observability. Methodologically, we build a scene-aware predictive framework that integrates multi-source geospatial indicators and explicitly handles extreme imbalance and environmental heterogeneity to estimate recognition likelihood under the current honor regime, treating national honor lists as administratively produced recognition outcomes rather than objective measures of village value. The model highlights four high-probability nomination belts on the QTP and reveals a pronounced DEM–NTL decoupling: the median NTL of currently honored QTP villages is 0, suggesting that NTL-based urban proxies can fail in high-altitude, data-scarce contexts. Overall, the observed under-representation is consistent with uneven observability and institutional constraints within the current honor system, and the proposed framework provides a scalable diagnostic and screening tool for identifying villages with high predicted recognition likelihood and supporting more evidence-aware rural data collection. Full article

(This article belongs to the Special Issue Rethinking Urban–Rural Dynamics Through the Lens of Social Geography)

► Show Figures

Figure 1

19 pages, 721 KB

Open AccessArticle

Evaluating EEG-Based Seizure Classification Using Foundation and Classical Ensemble Models

by George Obaido and Ebenezer Esenogho

Appl. Sci. 2026, 16(7), 3120; https://doi.org/10.3390/app16073120 - 24 Mar 2026

Viewed by 278

Abstract

Electroencephalogram (EEG)-based seizure classification remains challenging due to inter-subject variability and heterogeneous signal characteristics. Foundation models offer a promising alternative to dataset-specific training by leveraging pretrained priors. In this study, we evaluate a tabular foundation model, the Tabular Prior-Data Fitted Network (TabPFN), against [...] Read more.

Electroencephalogram (EEG)-based seizure classification remains challenging due to inter-subject variability and heterogeneous signal characteristics. Foundation models offer a promising alternative to dataset-specific training by leveraging pretrained priors. In this study, we evaluate a tabular foundation model, the Tabular Prior-Data Fitted Network (TabPFN), against classical ensemble baselines (gradient boosting, random forests, AdaBoost, and XGBoost) for EEG seizure segment classification. We use subject-independent GroupKFold cross-validation without out-of-fold evaluation to assess generalization to unseen individuals. Experiments on the Bangalore EEG Epilepsy Dataset (BEED) and the University of Bonn (Bonn) dataset show that TabPFN achieves higher accuracy than classical ensembles, reaching 99.7% on BEED and 99.6% on Bonn. These results suggest that pretrained tabular priors can be effective in feature-based EEG pipelines where subject-level generalization is required. Full article

(This article belongs to the Special Issue AI-Driven Healthcare)

► Show Figures

Figure 1

21 pages, 6212 KB

Open AccessArticle

Coastal Soil Salinity Inversion Using UAV Multispectral Imagery and an Interpretable Stacking Algorithm

by Xianfeng Hu, Dongfeng Han, Quan Qin, Yanhong Que, Han Wang, Donghan Feng, Rui Chen, Jinkui Duan, Yanpeng Li and Feng Li

Remote Sens. 2026, 18(5), 671; https://doi.org/10.3390/rs18050671 - 24 Feb 2026

Viewed by 419

Abstract

Accurate and timely monitoring of soil salinity is essential for the sustainable management and remediation of coastal salinization. This study utilized a UAV-based remote sensing platform to collect multispectral imagery and concurrent in situ soil salinity samples from an experimental zone within the [...] Read more.

Accurate and timely monitoring of soil salinity is essential for the sustainable management and remediation of coastal salinization. This study utilized a UAV-based remote sensing platform to collect multispectral imagery and concurrent in situ soil salinity samples from an experimental zone within the Yellow River Delta National Nature Reserve in July 2024. We constructed multiple spectral indices and employed advanced feature selection methods—namely VIP, MultiSURF, and PSO-SFLA—to identify the most informative index combination. We established a soil salinity retrieval model utilizing a stacking ensemble framework. This architecture integrated TabPFN, SVM, and Ridge regression as the base learners, while employing XGBoost as the meta-learner to synthesize the final predictions. Model interpretability was assessed using SHAP (SHapley Additive explanations) values, while predictive performance was evaluated using the coefficient of determination (R²), Standardized Root Mean Square Error (S_RMSE), and the Ratio of Performance to Deviation (RPD). Results indicate that the stacking model, when coupled with PSO-SFLA for feature selection, outperformed all other model configurations. It achieved the highest prediction accuracy on the test set, with an R² of 0.754, S_RMSE of 0.310, and RPD of 1.941. The resulting soil salinity distribution map exhibited a high degree of spatial agreement with the ground-truth survey data. This study demonstrates that leveraging a stacking algorithm with UAV multispectral data provides an accurate and reliable method for monitoring soil salinity in coastal wetlands, offering valuable technical support for effective soil salinization management. Full article

► Show Figures

Figure 1

20 pages, 4997 KB

Open AccessArticle

A Data-Driven Reduced-Order Model for Rotary Kiln Temperature Field Prediction Using Autoencoder and TabPFN

by Ya Mao, Yuhang Li, Yanhui Lai and Fangshuo Fan

Appl. Sci. 2026, 16(4), 2029; https://doi.org/10.3390/app16042029 - 18 Feb 2026

Viewed by 388

Abstract

The accurate reconstruction of the internal temperature field in rotary kilns is critical for optimizing the clinker calcination process and ensuring energy efficiency. In this study, a rapid and high-fidelity surrogate modeling framework is proposed, utilizing snapshot ensembles generated by full-order Computational Fluid [...] Read more.

The accurate reconstruction of the internal temperature field in rotary kilns is critical for optimizing the clinker calcination process and ensuring energy efficiency. In this study, a rapid and high-fidelity surrogate modeling framework is proposed, utilizing snapshot ensembles generated by full-order Computational Fluid Dynamics (CFD) simulations to reconstruct the temperature field of the axial center section. The framework incorporates a symmetric Autoencoder (AE) coupled with a TabPFN network as its core components. Capitalizing on the kiln’s strong axial symmetry, this reduction–regression system efficiently maps the high-dimensional nonlinear thermodynamic topology of the central section into a compact low-dimensional latent manifold via AE, while utilizing TabPFN to establish a robust mapping between operating boundary conditions and these latent features. By leveraging the In-Context Learning (ICL) mechanism for prior-data fitting, TabPFN effectively overcomes the data scarcity inherent in high-cost CFD sampling. Predictive results demonstrate that the model achieves a coefficient of determination (R²) of 0.897 for latent feature regression, outperforming traditional algorithms by 6.53%. In terms of field reconstruction on the test set, the model yields an average temperature error of 15.31 K. Notably, 93.83% of the nodal errors are confined within a narrow range of 0–50 K, and the reconstructed distributions exhibit high consistency with the CFD benchmarks. Furthermore, compared to the hours required for full-scale simulations, the inference time is reduced to 0.45 s, representing a speedup of four orders of magnitude. Consequently, the predictive system demonstrates excellent accuracy and efficiency, serving as an effective substitute for traditional models to realize online monitoring and intelligent optimization. Full article

(This article belongs to the Special Issue Fuel Cell Technologies in Power Generation and Energy Recovery)

► Show Figures

Figure 1

24 pages, 2125 KB

Open AccessArticle

MIC-SSO: A Two-Stage Hybrid Feature Selection Approach for Tabular Data

by Wei-Chang Yeh, Yunzhi Jiang, Hsin-Jung Hsu and Chia-Ling Huang

Electronics 2026, 15(4), 856; https://doi.org/10.3390/electronics15040856 - 18 Feb 2026

Viewed by 342

Abstract

High-dimensional structured datasets are common in fields such as semiconductor manufacturing, healthcare, and finance, where redundant and irrelevant features often increase computational cost and reduce predictive accuracy. Feature selection mitigates these issues by identifying a compact, informative subset of features, enhancing model efficiency, [...] Read more.

High-dimensional structured datasets are common in fields such as semiconductor manufacturing, healthcare, and finance, where redundant and irrelevant features often increase computational cost and reduce predictive accuracy. Feature selection mitigates these issues by identifying a compact, informative subset of features, enhancing model efficiency, performance, and interpretability. This study proposes Maximal Information Coefficient–Simplified Swarm Optimization (MIC-SSO), a two-stage hybrid feature selection method that combines the MIC as a filter with SSO as a wrapper. In Stage 1, MIC ranks feature relevance and removes low-contribution features; in Stage 2, SSO searches for an optimal subset from the reduced feature space using a fitness function that integrates the Matthews Correlation Coefficient (MCC) and feature reduction rate to balance accuracy and compactness. Experiments on five public datasets compare MIC-SSO with multiple hybrid, heuristic, and literature-reported methods, with results showing superior predictive accuracy and feature compression. The method’s ability to outperform existing approaches in terms of predictive accuracy and feature compression underscores its broader significance, offering a powerful tool for data analysis in fields like healthcare, finance, and semiconductor manufacturing. Statistical tests further confirm significant improvements over competing approaches, demonstrating the method’s effectiveness in integrating the efficiency of filters with the precision of wrappers for high-dimensional tabular data analysis. Full article

(This article belongs to the Special Issue Feature Papers in Networks: 2025–2026 Edition)

► Show Figures

Figure 1

22 pages, 2462 KB

Open AccessArticle

AI-Driven Weather Data Superresolution via Data Fusion for Precision Agriculture

by Jiří Pihrt, Petr Šimánek, Miroslav Čepek, Karel Charvát, Alexander Kovalenko, Šárka Horáková and Michal Kepka

Sensors 2026, 26(4), 1297; https://doi.org/10.3390/s26041297 - 17 Feb 2026

Viewed by 595

Abstract

Accurate field-scale meteorological information is required for precision agriculture, but operational numerical weather prediction products remain spatially coarse and cannot resolve local microclimate variability. This study proposes a data fusion superresolution workflow that combines global GFS predictors (0.25°), regional station observations from Southern [...] Read more.

Accurate field-scale meteorological information is required for precision agriculture, but operational numerical weather prediction products remain spatially coarse and cannot resolve local microclimate variability. This study proposes a data fusion superresolution workflow that combines global GFS predictors (0.25°), regional station observations from Southern Moravia (Czech Republic), and static physiographic descriptors (elevation and terrain gradients) to predict the 2 m air temperature 24 h ahead and to generate spatially continuous high-resolution temperature fields. Several model families (LightGBM, TabPFN, Transformer, and Bayesian neural fields) are evaluated under spatiotemporal splits designed to test generalization to unseen time periods and unseen stations; spatial mapping is implemented via a KNN interpolation layer in the physiographic feature space. All learned configurations reduce the mean absolute error relative to raw GFS across splits. In the most operationally relevant regime (unseen stations and unseen future period), TabPFN-KNN achieves the lowest MAE (1.26 °C), corresponding to an ≈24% reduction versus GFS (1.66 °C). The results support the feasibility of an operational, sensor-infrastructure-compatible pipeline for high-resolution temperature superresolution in agricultural landscapes. Full article

(This article belongs to the Special Issue Innovations in Remote Sensing and AI-Enabled Sensing for Smart Agriculture)

► Show Figures

Figure 1

23 pages, 2657 KB

Open AccessArticle

Benchmarking Tabular Foundation Models for Total Volatile Fatty Acid Prediction in Anaerobic Digestion

by Bibars Amangeldy, Zhanel Baigarayeva, Nurdaulet Tasmurzayev, Assiya Boltaboyeva, Baglan Imanbek, Marlen Maulenbekov, Sarsenbek Zhussupbekov, Waldemar Wojcik, Mergul Kozhamberdieva and Akzhan Konysbekova

Algorithms 2026, 19(2), 127; https://doi.org/10.3390/a19020127 - 5 Feb 2026

Viewed by 527

Abstract

Monitoring the concentration of Total Volatile Fatty Acids (TVFA (M)) is critical for ensuring the stability and efficiency of the Anaerobic Digestion (AD) process although conventional laboratory methods are often time-consuming and hinder real-time control. This study develops soft sensors based on machine [...] Read more.

Monitoring the concentration of Total Volatile Fatty Acids (TVFA (M)) is critical for ensuring the stability and efficiency of the Anaerobic Digestion (AD) process although conventional laboratory methods are often time-consuming and hinder real-time control. This study develops soft sensors based on machine learning techniques to predict TVFA (M) levels using readily available parameters such as pH, pCO₂, and Total Ammoniacal Nitrogen (TAN). A primary contribution of this work is the comprehensive benchmarking of the proposed approach against current State-of-the-Art (SOTA) deep learning and machine learning models including XGBoost, Random Forest, TorchMLP, and the advanced RealTabPFN-v2.5. Experimental results demonstrate that the RealTabPFN-v2.5 model outperforms other modern algorithms by achieving the highest accuracy with an R² of 0.889 and the lowest error rate with an RMSE of 0.0079. SHAP (SHapley Additive exPlanations) analysis was employed to interpret the model’s predictions, identifying pH as the most influential factor in TVFA (M) prediction and confirming that the model’s decision-making process aligns with established biological principles. These findings highlight the significant potential of integrating SOTA machine learning models into intelligent monitoring systems for the automation and optimization of biogas production processes. Full article

(This article belongs to the Special Issue AI Applications and Modern Industry)

► Show Figures

Figure 1

26 pages, 1051 KB

Open AccessArticle

Neural Signatures of Speed and Regular Reading: A Machine Learning and Explainable AI (XAI) Study of Sinhalese and Japanese

by Thishuli Walpola, Namal Rathnayake, Hoang Ngoc Thanh, Niluka Dilhani and Atsushi Senoo

Information 2026, 17(1), 108; https://doi.org/10.3390/info17010108 - 21 Jan 2026

Cited by 1 | Viewed by 426

Abstract

Reading speed is hypothesized to have distinct neural signatures across orthographically diverse languages, yet cross-linguistic evidence remains limited. We investigated this by classifying speed readers versus regular readers among Sinhalese and Japanese adults (

n = 142

) using task-based fMRI and 35 [...] Read more.

Reading speed is hypothesized to have distinct neural signatures across orthographically diverse languages, yet cross-linguistic evidence remains limited. We investigated this by classifying speed readers versus regular readers among Sinhalese and Japanese adults (

n = 142

) using task-based fMRI and 35 supervised machine learning classifiers. Functional activation was extracted from 12 reading-related cortical regions. We introduced Fuzzy C-Means (FCM) clustering for data augmentation and Shapley additive explanations (SHAP) for model interpretability, enabling evaluation of region-wise contributions to reading speed classification. The best model, an FT-TABPFN network with FCM augmentation, achieved 81.1% test accuracy in the Combined cohort. In the Japanese-only cohort, Quadratic SVM and Subspace KNN each reached 85.7% accuracy. SHAP analysis revealed that the angular gyrus (AG) and inferior frontal gyrus (triangularis) were the strongest contributors across cohorts. Additionally, the anterior supra marginal gyrus (ASMG) appeared as a higher contributor in the Japanese-only cohort, while the posterior superior temporal gyrus (PSTG) contributed strongly to both cohorts separately. However, the posterior middle temporal gyrus (PMTG) showed less or no contribution to the model classification in each cohort. These findings demonstrate the effectiveness of interpretable machine learning for decoding reading speed, highlighting both universal neural predictors and language-specific differences. Our study provides a novel, generalizable framework for cross-linguistic neuroimaging analysis of reading proficiency. Full article

► Show Figures

Graphical abstract

24 pages, 1576 KB

Open AccessArticle

Non-Imaging Differential Diagnosis of Lower Limb Osteoarthritis: An Interpretable Machine Learning Framework

by Zhanel Baigarayeva, Assiya Boltaboyeva, Baglan Imanbek, Bibars Amangeldy, Nurdaulet Tasmurzayev, Kassymbek Ozhikenov, Assylbek Ozhiken, Zhadyra Alimbayeva and Naoya Maeda-Nishino

Algorithms 2026, 19(1), 87; https://doi.org/10.3390/a19010087 - 20 Jan 2026

Viewed by 431

Abstract

Background: Osteoarthritis (OA) is a prevalent chronic degenerative disorder, with coxarthrosis (hip OA) and gonarthrosis (knee OA) representing its most significant clinical manifestations. While diagnosis typically relies on imaging, such methods can be resource-intensive and insensitive to early disease trajectories. Objective: This study [...] Read more.

Background: Osteoarthritis (OA) is a prevalent chronic degenerative disorder, with coxarthrosis (hip OA) and gonarthrosis (knee OA) representing its most significant clinical manifestations. While diagnosis typically relies on imaging, such methods can be resource-intensive and insensitive to early disease trajectories. Objective: This study aims to achieve the differential diagnosis of coxarthrosis and gonarthrosis using solely routine preoperative clinical and laboratory data, benchmarking state-of-the-art machine learning algorithms. Methods: A retrospective analysis was conducted on 893 patients (617 with knee OA, 276 with hip OA) from a clinical hospital in Almaty, Kazakhstan. The study evaluated a diverse portfolio of models, including gradient boosting decision trees (LightGBM, XGBoost, CatBoost), deep learning architectures (RealMLP, TabDPT, TabM), and the pretrained tabular foundation model RealTabPFN v2.5. Results: The RealTabPFN v2.5 (Tuned) model achieved superior performance, recording a mean ROC–AUC of 0.9831, accuracy of 0.9485, and an F1-score of 0.9474. SHAP interpretability analysis identified heart rate (66.2%) and age (18.1%) as the dominant predictors driving the model’s decision-making process. Conclusion: Pretrained tabular foundation models demonstrate exceptional capability in distinguishing OA subtypes using limited clinical datasets, outperforming traditional ensemble methods. This approach offers a practical, high-performance triage tool for primary clinical assessment in resource-constrained settings. Full article

(This article belongs to the Special Issue Machine Learning for Advanced Healthcare: Bridging Innovation and Clinical Implementation)

► Show Figures

Figure 1

17 pages, 2326 KB

Open AccessArticle

Explainable AutoML with Uncertainty Quantification for CO₂-Cured Concrete Compressive Strength Prediction

by Liping Wang, Yuanfeng Wang, Chengcheng Shi, Baolong Ma, Yinshan Liu, Boqun Zhang, Shaoqin Xue, Xinlei Chang and Xiaodong Liu

Buildings 2026, 16(1), 89; https://doi.org/10.3390/buildings16010089 - 24 Dec 2025

Viewed by 611

Abstract

The cement and concrete industry is one of the primary sources of anthropogenic carbon dioxide (CO₂) emissions globally, responsible for nearly 8% of total emissions, making the need for a low-carbon transition urgent. CO₂ curing provides both strength enhancement and [...] Read more.

The cement and concrete industry is one of the primary sources of anthropogenic carbon dioxide (CO₂) emissions globally, responsible for nearly 8% of total emissions, making the need for a low-carbon transition urgent. CO₂ curing provides both strength enhancement and carbon sequestration, yet the compressive strength of such concrete remains challenging to predict due to limited and strongly coupled experimental factors. This study developed an explainable Automated Machine Learning (AutoML) framework with integrated uncertainty quantification to predict the 28-day compressive strength of CO₂-cured concrete. The framework was built using 198 standardized experimental data and trained with four algorithms—Random Forest (RF), Support Vector Regression (SVR), eXtreme Gradient Boosting (XGBoost), and the transformer-based Tabular Prior-Data Fitted Network (TabPFN). To enhance model accuracy and efficiency, stratified cross-validation, hyperparameter optimization, and bootstrap-based uncertainty analysis were applied during training. The results show that TabPFN achieves the highest predictive accuracy (test R² = 0.959) and maintains a stable 95% prediction interval. SHapley Additive exPlanations (SHAP) indicates that cement content, aggregate composition, water–binder (W/B) ratio, and CO₂ curing time are the dominant factors, with an optimal W/B ratio near 0.40. Interaction analysis further reveals synergistic effects between cement content and W/B, and a strengthening coupling between curing time and CO₂ concentration at longer durations. The framework enhances predictive reliability and explainability, supporting mixture design and curing optimization for low-carbon concrete development. Full article

(This article belongs to the Section Building Materials, and Repair & Renovation)

► Show Figures

Figure 1

30 pages, 8582 KB

Open AccessArticle

Machine Learning Approaches for Assessing Avocado Alternate Bearing Using Sentinel-2 and Climate Variables—A Case Study in Limpopo, South Africa

by Muhammad Moshiur Rahman, Andrew Robson and Theo Bekker

Remote Sens. 2025, 17(24), 3935; https://doi.org/10.3390/rs17243935 - 5 Dec 2025

Viewed by 1147

Abstract

Alternate (irregular) bearing, characterized by large fluctuations in fruit yield between consecutive years, remains a major constraint to sustainable avocado (Persea americana) production. This study aimed to assess the potential of satellite remote sensing and climatic variables to characterize and predict [...] Read more.

Alternate (irregular) bearing, characterized by large fluctuations in fruit yield between consecutive years, remains a major constraint to sustainable avocado (Persea americana) production. This study aimed to assess the potential of satellite remote sensing and climatic variables to characterize and predict alternate bearing patterns in commercial orchards in Tzaneen, Limpopo Province, South Africa. Historical yield data (2018–2024) from 46 “Hass” avocado blocks were analyzed alongside Sentinel-2 derived vegetation indices (NDVI, GNDVI, NDRE, CIG, CIRE, EVI2, LSWI) and flowering indices (WYI, NDYI, MTYI). To align temporal scales, all VIs and FIs were aggregated into eight quarterly averages from the two years preceding each yield year and spatially averaged across each orchard block. Climatic predictors including maximum temperature (Tmax), minimum temperature (Tmin), vapor pressure deficit (VPD), and precipitation were screened against historical yields to identify critical periods, with June–October emerging as the most influential months, and these variables were aggregated accordingly to match annual alternate bearing patterns. Five machine learning (ML) algorithms—Random Forest, XGBoost, CATBoost, LightGBM, and TabPFN—were trained and tested using a Leave-One-Year-Out (LOYO) approach. Results showed that VPD, Tmin, and Tmax during the flowering period (July–September) were the most influential variables affecting subsequent yields. TabPFN achieved the highest predictive accuracy (Accuracy = 0.88; AUC = 0.95) and strongest temporal generalization. Spectral gradients between flowering and early fruit drop were lower during “on” years, reflecting stable canopy vigor. This combined use of remote sensing and climatic variables in a ML framework represents a novel approach, and the findings demonstrate that integrating remote sensing and climatic indicators enables early discrimination of “on” and “off” years, supporting proactive orchard management and improved yield stability. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Remote Sensing for Crop Information Extraction and Status Monitoring)

► Show Figures

Figure 1

23 pages, 5218 KB

Open AccessArticle

Development of Plasma Protein Classification Models for Alzheimer’s Disease Using Multiple Machine Learning Approaches

by Amy Tsurumi, Catherine M. Cahill, Andy J. Liu, Pranam Chatterjee, Sudeshna Das and Ami Kobayashi

Int. J. Mol. Sci. 2025, 26(23), 11673; https://doi.org/10.3390/ijms262311673 - 2 Dec 2025

Viewed by 1294

Abstract

Alzheimer’s Disease (AD) management is challenging due to limitations in detection methods. Currently, cerebrospinal fluid (CSF) biomarkers involve assessing β-amyloid (Aβ) and phosphorylated tau proteins. The lumbar puncture procedure to obtain CSF is invasive and sometimes causes significant anxiety in patients. In contrast, [...] Read more.

Alzheimer’s Disease (AD) management is challenging due to limitations in detection methods. Currently, cerebrospinal fluid (CSF) biomarkers involve assessing β-amyloid (Aβ) and phosphorylated tau proteins. The lumbar puncture procedure to obtain CSF is invasive and sometimes causes significant anxiety in patients. In contrast, plasma biomarkers would allow rapid, accurate, and cost-effective diagnosis, while minimizing invasiveness and discomfort. Using a dataset involving 120 plasma proteins from clinically diagnosed AD patients versus cognitively normal subjects, we developed classification models by applying various machine learning algorithms (EBlasso, EBEN, XGBoost, LightGBM, TabNet, and TabPFN) to plasma proteomic measurements. Gene ontology and pathway enrichment, and a literature review were used to evaluate the potential relevance of the biomarkers identified in AD-related mechanisms. Biomarkers identified were also evaluated for the enrichment of aging-related biomarkers. The models developed yielded high AUROC and accuracy, mostly >0.9. Proteins selected as predictors by all the models included Angiopoietin-2 (ANG-2), epidermal growth factor (EGF), Interleukin 1α (IL-1α), and platelet growth factor subunit B (PDGF-BB). Ample previous literature supported their relevance in AD. The pool of all the biomarkers identified was significantly enriched with known aging-related biomarkers (p = 0.040). Applying cutting-edge algorithms is expected to be advantageous for developing AD prediction models with plasma proteomic data, and future large studies to externally validate the constructed models in other populations to assess their generalizability is important. The proteins uncovered may represent novel preventative or therapeutic targets. Full article

(This article belongs to the Special Issue Advances in Molecular Mechanisms of Neurodegenerative Diseases)

► Show Figures

Figure 1

Search Results (28)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (28)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI