Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (698)

Search Parameters:
Keywords = SHAP framework

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
34 pages, 5939 KB  
Article
Detection and Classification of Alzheimer’s Disease Using Deep and Machine Learning
by Muhammad Zaeem Khalid, Nida Iqbal, Babar Ali, Jawwad Sami Ur Rahman, Saman Iqbal, Lama Almudaimeegh, Zuhal Y. Hamd and Awadia Gareeballah
Tomography 2026, 12(1), 4; https://doi.org/10.3390/tomography12010004 - 26 Dec 2025
Abstract
Background/Objectives: Alzheimer’s disease is the leading cause of dementia, marked by progressive cognitive decline and a severe socioeconomic burden. Early and accurate diagnosis is crucial to enhancing patient outcomes, yet traditional clinical and imaging assessments are often limited in sensitivity, particularly at early [...] Read more.
Background/Objectives: Alzheimer’s disease is the leading cause of dementia, marked by progressive cognitive decline and a severe socioeconomic burden. Early and accurate diagnosis is crucial to enhancing patient outcomes, yet traditional clinical and imaging assessments are often limited in sensitivity, particularly at early stages. This study presents a dual-modal framework that integrates symptom-based clinical data with magnetic resonance imaging (MRI) using machine learning (ML) and deep learning (DL) models, enhanced by explainable AI (XAI). Methods: Four ML classifiers—K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF)—were trained on demographic and clinical features. For stage-wise classification, five DL models—CNN, EfficientNetB3, DenseNet-121, ResNet-50, and MobileNetV2—were applied to MRI scans. Interpretability was incorporated through SHAP and Grad-CAM visualizations. Results: Random Forest achieves the highest accuracy of 97% on clinical data, while CNN achieves the best overall performance of 94% in MRI-based staging. SHAP and Grad-CAM were used to find clinically relevant characteristics and brain areas, including hippocampal atrophy and ventricular enlargement. Conclusions: Integrating clinical and imaging data and interpretable AI improves the accuracy and reliability of AD staging. The proposed model offers a valid and clear diagnostic route, which can assist clinicians in making timely diagnoses and adjusting individual treatment. Full article
Show Figures

Figure 1

29 pages, 7545 KB  
Article
Winter Wheat Yield Estimation Under Different Management Practices Using Multi-Source Data Fusion
by Hao Kong, Jingxu Wang, Taiyi Cai, Jun Du, Chang Zhao, Chanjuan Hu and Han Jiang
Agronomy 2026, 16(1), 71; https://doi.org/10.3390/agronomy16010071 (registering DOI) - 25 Dec 2025
Abstract
Accurate crop yield estimation under differentiated management practices is a core requirement for the development of smart agriculture. However, current yield estimation models face two major challenges: limited adaptability to different management practices, thus exhibiting poor generalizability, and ineffective integration of multi-source remote [...] Read more.
Accurate crop yield estimation under differentiated management practices is a core requirement for the development of smart agriculture. However, current yield estimation models face two major challenges: limited adaptability to different management practices, thus exhibiting poor generalizability, and ineffective integration of multi-source remote sensing features, limiting further improvements in estimation accuracy. To address these issues, this study integrated UAV-based multispectral and thermal infrared remote sensing data to propose a yield estimation framework based on multi-source feature fusion. First, three machine learning algorithms—Partial Least Squares Regression (PLSR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)—were employed to retrieve key biochemical parameters of winter wheat. The RF model demonstrated superior performance, with retrieval accuracies for chlorophyll, nitrogen, and phosphorus contents of R2 = 0.8347, 0.5914, and 0.9364 and RMSE = 0.2622, 0.4127, and 0.0236, respectively. Subsequently, yield estimation models were constructed by integrating the retrieved biochemical parameters with phenotypic traits such as plant height and biomass. The RF model again exhibited superior performance (R2 = 0.66, RMSE = 867.28 kg/ha). SHapley Additive exPlanations (SHAP) analysis identified May chlorophyll content (Chl-5) and March chlorophyll content (Chl-3) as the most critical variables for yield prediction, with stable positive contributions to yield when their values exceeded 2.80 mg/g and 2.50 mg/g, respectively. The quantitative assessment of management practices revealed that the straw return + 50% inorganic fertilizer + 50% organic fertilizer (RIO50) treatment under the combined organic–inorganic fertilization regime achieved the highest measured grain yield (11,469 kg/ha). Consequently, this treatment can be regarded as an optimized practice for attaining high yield. This study confirms that focusing on chlorophyll dynamics during key physiological stages is an effective approach for enhancing yield estimation accuracy under varied management practices, providing a technical basis for precise field management. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

15 pages, 2302 KB  
Article
A Day-Ahead Wind Power Dynamic Explainable Prediction Method Based on SHAP Analysis and Mixture of Experts
by Hao Zhang, Guoyuan Qin, Xiangyan Chen, Linhai Lu, Ziliang Zhang and Jiajiong Song
Energies 2026, 19(1), 124; https://doi.org/10.3390/en19010124 - 25 Dec 2025
Abstract
Traditional single-prediction models often exhibit limitations in meeting wind power prediction requirements in complex operational scenarios. Furthermore, the inherent “black-box” nature of deep learning models leads to limited interpretability of predictions, hindering effective support for grid dispatch planning. To address these issues, this [...] Read more.
Traditional single-prediction models often exhibit limitations in meeting wind power prediction requirements in complex operational scenarios. Furthermore, the inherent “black-box” nature of deep learning models leads to limited interpretability of predictions, hindering effective support for grid dispatch planning. To address these issues, this study proposes a novel day-ahead wind power prediction method, referred to as SHapley Additive exPlanations (SHAP)–Mixture of Experts (MoE), which integrates SHAP into an MoE framework. Here, SHAP is employed for interpretability purposes. This study innovatively transforms SHAP analysis into prior knowledge to guide the decision-making of the MoE gating network and proposes a two-layer dynamic interpretation mechanism based on the collaborative analysis of gating weights and SHAP values. This approach clarifies key meteorological factors and the model’s advantageous scenarios, while quantifying the uncertainty among multiple expert decisions. Firstly, each expert model was pre-trained, and its parameters were frozen to construct a candidate expert pool. Secondly, the SHAP vectors for each pre-trained expert were computed over all sample features to characterize their decision-making logic under varying scenarios. Thirdly, an augmented feature set was constructed by fusing the original meteorological features with SHAP attribution matrices from all experts; this set was used to train the gating network within the MoE framework. Finally, for new input samples, each frozen expert model generates a prediction along with its corresponding SHAP vector, and the gating network aggregates these predictions to produce the final forecast. The proposed method was validated using operational data from an offshore wind farm located in southeastern China. Compared with the best individual expert model and traditional ensemble forecasting models, the proposed method reduces the Root Mean Square Error (RMSE) by 0.23% to 4.92%. Furthermore, the method elucidates the influence of key features on each expert’s decisions, offering insights into how the gating network adaptively selects experts based on the input features and expert-specific characteristics across different scenarios. Full article
(This article belongs to the Topic Advances in Wind Energy Technology: 2nd Edition)
Show Figures

Figure 1

27 pages, 6795 KB  
Article
Short-Term Wind Power Prediction Model Based on SVMD-KANCNN-BiLSTM
by Xinyue Li, Yu Xin, Youming Huo, Zhuoxi Li, Yi Gu, Xi He and Xu Zhou
Sustainability 2026, 18(1), 246; https://doi.org/10.3390/su18010246 - 25 Dec 2025
Abstract
The large-scale integration of wind power generation, as an important sustainable energy, into the power grid relies on the support of the power system, and accurate wind power prediction is the key to ensuring the continuous and stable operation of the power system. [...] Read more.
The large-scale integration of wind power generation, as an important sustainable energy, into the power grid relies on the support of the power system, and accurate wind power prediction is the key to ensuring the continuous and stable operation of the power system. Therefore, this paper proposes a hybrid wind power prediction model that integrates Successive Variational Mode Decomposition (SVMD) with KANCNN-BiLSTM. To address data volatility, the original wind power sequence is decomposed into seven modal components using SVMD. Subsequently, for enhanced capability in capturing nonlinear relationships, a KAN linear layer is integrated into a convolutional neural network, constructing the KANCNN-BiLSTM model for component prediction. Simultaneously, model hyperparameters are optimized via the Optuna framework to further improve predictive performance. Additionally, SHAP theory is applied to interpret the contribution of each component to the prediction results, thereby enhancing the transparency of the decomposition–integration process. Experimental results indicate that the proposed interpretable SVMD-KANCNN-BiLSTM wind power prediction model achieves a prediction accuracy of 0.998959, outperforms all comparison models across multiple evaluation metrics, and indicates superior predictive capability; additionally, the global interpretability analysis reveals that all IMF components positively contribute to the model’s predictions. The establishment of this model provides an interpretable new approach for realizing wind power prediction. Full article
(This article belongs to the Section Energy Sustainability)
Show Figures

Figure 1

39 pages, 4328 KB  
Article
Spatial Mechanisms and Coupling Coordination of Cultural Heritage and Tourism Along the Jinzhong Segment of the Great Tea Road
by Lihao Meng, Zunni Du, Zehui Jia and Lei Cao
Heritage 2026, 9(1), 7; https://doi.org/10.3390/heritage9010007 - 25 Dec 2025
Abstract
Linear cultural heritage is characterized by complex cross-regional and multi-level features, facing severe challenges of spatial resource fragmentation and an imbalance in cultural and tourism functions. However, existing research lacks quantitative analysis regarding the non-linear driving mechanisms of spatial distribution and the misalignment [...] Read more.
Linear cultural heritage is characterized by complex cross-regional and multi-level features, facing severe challenges of spatial resource fragmentation and an imbalance in cultural and tourism functions. However, existing research lacks quantitative analysis regarding the non-linear driving mechanisms of spatial distribution and the misalignment of culture–tourism coupling. In this study, we construct an integrated identification–explanation–coupling–governance (IECG) theoretical framework. Taking The Great Tea Road (Jinzhong Section) as a case study, our framework integrates the CCSPM, XGBoost-SHAP machine learning interpreter, and Geodetector to systematically quantify the spatial structure of heritage and the level of culture–tourism integration. The results indicate that, (1) in terms of spatial patterns, the study area exhibits an unbalanced agglomeration characteristic of “dual-primary and dual-secondary cores,” with high-density areas showing significant orientation along rivers and roads; (2) regarding driving mechanisms, the machine learning model reveals a significant “non-linear threshold effect,” with 83% of driving factors (e.g., elevation and distance to transportation) exhibiting non-linear fluctuations in their influence on heritage distribution; and, (3) in terms of culture–tourism coupling, the overall coupling coordination degree (CCD) is low (mean 0.38), indicating significant “resource–facility” spatial misalignment. The modern number of public cultural facilities (NCF) is identified as the primary obstacle restricting the transformation of high-grade heritage into tourism products. Based on these findings, we propose adaptive zoning governance strategies. This research not only theoretically clarifies the complexity of the social–ecological system of linear heritage but also provides a generalizable quantitative method for the digital protection and sustainable tourism planning of cross-regional cultural heritage. Full article
36 pages, 7158 KB  
Article
Towards Sustainable Heritage Conservation: A Hybrid Landslide Susceptibility Mapping Framework in Japan’s UNESCO Mountain Villages
by Ahmed Bassem, Hassan Shokry, Shinjiro Kanae and Mahmoud Sharaan
Sustainability 2026, 18(1), 237; https://doi.org/10.3390/su18010237 - 25 Dec 2025
Abstract
Sustainable management of cultural heritage in mountainous regions requires effective strategies to mitigate natural hazards such as landslides. Landslide susceptibility mapping (LSM) provides a critical tool to support these conservation efforts. This study presents a hybrid framework that integrates probabilistic slope stability modeling [...] Read more.
Sustainable management of cultural heritage in mountainous regions requires effective strategies to mitigate natural hazards such as landslides. Landslide susceptibility mapping (LSM) provides a critical tool to support these conservation efforts. This study presents a hybrid framework that integrates probabilistic slope stability modeling with ensemble learning for LSM in the UNESCO World Heritage sites of Shirakawa-gō and Gokayama, Japan. The framework uses probabilities of failure from Bishop’s simplified method combined with Monte Carlo simulations to guide non-landslide sample selection. An enhanced tri-parametric optimization was applied to refine the slope unit segmentation process. SHAP analysis revealed that the hybrid framework emphasizes physically meaningful features such as rainfall. The proposed method results in AUC gains of 0.072 for XGBoost, 0.066 CatBoost for, and 0.063 for LightGBM compared to their buffer-based counterparts. Future landslide susceptibility was mapped based on the 2035 precipitation projections from ARIMA time-series modeling. By enhancing accuracy, interpretability, and geotechnical consistency, the proposed approach delivers a robust tool for sustainable risk management. The study further evaluates the exposure of Gasshō-style houses and other historic buildings to varying levels of landslide susceptibility, offering actionable insights for local planning and heritage conservation. Full article
(This article belongs to the Section Hazards and Sustainability)
17 pages, 3508 KB  
Article
Precise Discrimination Between Rape Honey and Acacia Honey Based on Sugar and Amino Acid Profiles Combined with Machine Learning
by Chenyu Sun, Fei Pan, Wenli Tian, Zongyan Cui, Xiaofeng Xue and Yitian Xu
Foods 2026, 15(1), 70; https://doi.org/10.3390/foods15010070 - 25 Dec 2025
Abstract
Honey variety authentication is critical for ensuring market integrity and protecting consumer rights, especially for high-value unifloral honeys, such as acacia honey, which are frequently adulterated with low-value alternatives such as rape honey due to their similar visual appearance. The aim of this [...] Read more.
Honey variety authentication is critical for ensuring market integrity and protecting consumer rights, especially for high-value unifloral honeys, such as acacia honey, which are frequently adulterated with low-value alternatives such as rape honey due to their similar visual appearance. The aim of this study was to develop a method for precise discrimination between rape honey and acacia honey using their chemical profiles combined with machine learning. A total of 542 honey samples were collected from major beekeeping regions in China. Targeted quantification of 12 sugars and 20 amino acids was performed using UPLC-MS/MS. Multivariate analysis revealed significant differences in sugar and amino acid compositions between the two honey types, though partial samples overlapped due to chemical similarity. Six machine learning algorithms, including the Multilayer Perceptron, were employed for classification. Optimization was performed via 10-fold cross-validation and ADASYN oversampling, yielding optimal performance of 98% and 100% prediction accuracies for rape honey and acacia honey, respectively, on the independent test set. SHAP (Shapley Additive Explanations) analysis identified key differential markers, including fructose, turanose, glucose, and GABA, which contributed most to the classification. Furthermore, a user-friendly web application was developed to facilitate rapid on-site authentication. This study provides an innovative technical framework for honey variety discrimination, with potential applications in quality control and anti-fraud practices. Full article
Show Figures

Figure 1

22 pages, 8743 KB  
Article
Deep Learning-Based State Estimation for Sodium-Ion Batteries Using Long Short-Term Memory Network
by Yunzhe Li, Yuhao Li, Jiangong Zhu, Haifeng Dai, Zhi Li and Bo Jiang
Batteries 2026, 12(1), 6; https://doi.org/10.3390/batteries12010006 - 25 Dec 2025
Abstract
Sodium-ion batteries (SIBs) have attracted growing attention as an alternative to lithium-ion technologies for electric mobility and stationary energy-storage applications, owing to the wide availability of sodium resources, cost advantages, and comparatively favorable safety characteristics. Accurate state-of-health (SOH) estimation is essential for safe [...] Read more.
Sodium-ion batteries (SIBs) have attracted growing attention as an alternative to lithium-ion technologies for electric mobility and stationary energy-storage applications, owing to the wide availability of sodium resources, cost advantages, and comparatively favorable safety characteristics. Accurate state-of-health (SOH) estimation is essential for safe and reliable SIB deployment, yet existing data-driven methods still suffer from limited accuracy and interpretability, as well as a lack of dedicated aging datasets. This study proposes an explainable SOH estimation methodology based on a long short-term memory (LSTM) network combined with model-agnostic KernelSHAP analysis. Thirteen health indicators (HIs) are extracted from charge/discharge data and post-charge relaxation segments, and the most relevant indicators are selected via Pearson correlation screening as model inputs. Built on these HIs, an LSTM-based multi-step framework is developed to take HI sequences as input and forecast the SOH trajectory over the subsequent 20 cycles. Experimental results show that the proposed method achieves high accuracy and robust cross-cell generalization, with mean absolute error (MAE) below 1.0%, root-mean-square error (RMSE) below 1.2% across all cells, and an average RMSE of about 0.75% in the main cross-cell setting. KernelSHAP-based global and temporal analyses further clarify how different HIs and time positions influence SOH estimates, enhancing model transparency and physical interpretability. Full article
(This article belongs to the Special Issue Control, Modelling, and Management of Batteries)
Show Figures

Figure 1

18 pages, 2326 KB  
Article
Explainable AutoML with Uncertainty Quantification for CO2-Cured Concrete Compressive Strength Prediction
by Liping Wang, Yuanfeng Wang, Chengcheng Shi, Baolong Ma, Yinshan Liu, Boqun Zhang, Shaoqin Xue, Xinlei Chang and Xiaodong Liu
Buildings 2026, 16(1), 89; https://doi.org/10.3390/buildings16010089 - 24 Dec 2025
Abstract
The cement and concrete industry is one of the primary sources of anthropogenic carbon dioxide (CO2) emissions globally, responsible for nearly 8% of total emissions, making the need for a low-carbon transition urgent. CO2 curing provides both strength enhancement and [...] Read more.
The cement and concrete industry is one of the primary sources of anthropogenic carbon dioxide (CO2) emissions globally, responsible for nearly 8% of total emissions, making the need for a low-carbon transition urgent. CO2 curing provides both strength enhancement and carbon sequestration, yet the compressive strength of such concrete remains challenging to predict due to limited and strongly coupled experimental factors. This study developed an explainable Automated Machine Learning (AutoML) framework with integrated uncertainty quantification to predict the 28-day compressive strength of CO2-cured concrete. The framework was built using 198 standardized experimental data and trained with four algorithms—Random Forest (RF), Support Vector Regression (SVR), eXtreme Gradient Boosting (XGBoost), and the transformer-based Tabular Prior-Data Fitted Network (TabPFN). To enhance model accuracy and efficiency, stratified cross-validation, hyperparameter optimization, and bootstrap-based uncertainty analysis were applied during training. The results show that TabPFN achieves the highest predictive accuracy (test R2 = 0.959) and maintains a stable 95% prediction interval. SHapley Additive exPlanations (SHAP) indicates that cement content, aggregate composition, water–binder (W/B) ratio, and CO2 curing time are the dominant factors, with an optimal W/B ratio near 0.40. Interaction analysis further reveals synergistic effects between cement content and W/B, and a strengthening coupling between curing time and CO2 concentration at longer durations. The framework enhances predictive reliability and explainability, supporting mixture design and curing optimization for low-carbon concrete development. Full article
(This article belongs to the Section Building Materials, and Repair & Renovation)
Show Figures

Figure 1

14 pages, 3352 KB  
Article
An XGBoost-Based Morphometric Classification System for Automatic Subspecies Identification of Apis mellifera
by Miaoran Zhang, Yali Du, Xiaoyin Deng, Jinming He, Haibin Jiang, Yuling Liu, Jingyu Hao, Peng Chen, Kai Xu and Qingsheng Niu
Insects 2026, 17(1), 27; https://doi.org/10.3390/insects17010027 - 24 Dec 2025
Abstract
The conservation and breeding of the western honey bee (Apis mellifera) is central dependent on accurate subspecies assignment, but the most commonly used methods are labor-intensive classical morphometrics and costly molecular assays. We developed an XGBoost-based classification framework using a compact [...] Read more.
The conservation and breeding of the western honey bee (Apis mellifera) is central dependent on accurate subspecies assignment, but the most commonly used methods are labor-intensive classical morphometrics and costly molecular assays. We developed an XGBoost-based classification framework using a compact set of routinely measurable characters. A curated dataset of labeled workers was measured under harmonized protocols; features were screened according to embedded importance, and model performance was assessed using five-fold cross-validation, outperforming standard machine learning baselines. The resulting model using only the top 10 characters—primarily forewing venation angles and abdominal plate metrics—achieved high performance (accuracy = 0.98; F1 = 0.99) and an area under the receiver operating characteristic curve (AUC) of 0.99 (95% CI = 0.995–0.999). SHAP analyses confirmed the discriminatory contributions of these features, while error inspection suggested that misclassifications were concentrated in morphologically overlapping lineages. The model’s performance supports its use as a rapid triage tool alongside genetic testing, providing a scalable and interpretable tool for researchers to create and deploy custom morphometric models, demonstrated here for A. mellifera but portable to other insect taxa. Full article
(This article belongs to the Special Issue Biology and Conservation of Honey Bees)
Show Figures

Graphical abstract

32 pages, 1696 KB  
Article
Financial Statement Fraud Detection Through an Integrated Machine Learning and Explainable AI Framework
by Tsolmon Sodnomdavaa and Gunjargal Lkhagvadorj
J. Risk Financial Manag. 2026, 19(1), 13; https://doi.org/10.3390/jrfm19010013 - 24 Dec 2025
Viewed by 39
Abstract
Financial statement fraud remains a substantial risk in environments marked by weak regulatory oversight and information asymmetry. This study develops a decision-centric framework that integrates machine learning, explainable artificial intelligence, and decision curve analysis to improve fraud detection under severe class imbalance. Using [...] Read more.
Financial statement fraud remains a substantial risk in environments marked by weak regulatory oversight and information asymmetry. This study develops a decision-centric framework that integrates machine learning, explainable artificial intelligence, and decision curve analysis to improve fraud detection under severe class imbalance. Using 969 firm-year observations from 132 Mongolian firms (2013–2024), we evaluate 21 financial ratios with models including Random Forest, XGBoost, LightGBM, MLP, TabNet, and a Stacking Ensemble trained with SMOTE and class-weighted learning. Performance was assessed using PR-AUC, F1-score, Recall, and DeLong-based significance testing. The Stacking Ensemble achieved the strongest results (PR-AUC = 0.93; F1 = 0.83), outperforming both classical and modern baseline models. Interpretability analyses (SHAP, LIME, and counterfactual explanations) consistently identified leverage, profitability, and liquidity indicators as dominant drivers of fraud risk, supported by a SHAP Stability Index of 0.87. Decision curve analysis showed that calibrated thresholds improved decision efficiency by 7–9% and reduced over-audit costs by 3–4%, while an audit cost simulation estimated annual savings of 80–100 million MNT. Overall, the proposed ML–XAI–DCA framework offers a transparent, interpretable, and cost-efficient approach for enhancing fraud detection in emerging-market contexts with limited textual disclosures. Full article
Show Figures

Figure 1

31 pages, 2989 KB  
Article
Percentile-Based Outbreak Thresholding for Machine Learning-Driven Pest Forecasting in Rice (Oryza sativa L.) Farming: A Case Study on Rice Black Bug (Scotinophara coarctata F.) and the White Stemborer (Scirpophaga innotata W.)
by Gina D. Balleras, Sailila E. Abdula, Cristine G. Flores and Reymark D. Deleña
Sustainability 2026, 18(1), 182; https://doi.org/10.3390/su18010182 - 24 Dec 2025
Viewed by 145
Abstract
Rice (Oryza sativa L.) production in the Philippines remains highly vulnerable to recurrent outbreaks of the Rice Black Bug (RBB; Scotinophara coarctata F.) and White Stemborer (WSB; Scirpophaga innotata W.), two of the most destructive pests in Southeast Asian rice ecosystems. Classical [...] Read more.
Rice (Oryza sativa L.) production in the Philippines remains highly vulnerable to recurrent outbreaks of the Rice Black Bug (RBB; Scotinophara coarctata F.) and White Stemborer (WSB; Scirpophaga innotata W.), two of the most destructive pests in Southeast Asian rice ecosystems. Classical economic threshold levels (ETLs) are difficult to estimate in smallholder settings due to the lack of cost–loss data, often leading to either delayed or excessive pesticide application. To address this, the present study developed an adaptive outbreak-forecasting framework that integrates the Number–Size (N–S) fractal model with machine learning (ML) classifiers to define and predict pest regime transitions. Seven years (2018–2024) of light-trap surveillance data from the Philippine Rice Research Institute–Midsayap Experimental Station were combined with daily climate variables from the NASA POWER database, including air temperature, humidity, precipitation, wind, soil moisture, and lunar phase. The N–S fractal model identified natural breakpoints in the log–log cumulative frequency of pest counts, yielding early-warning and severe-outbreak thresholds of 134 and 250 individuals for WSB and 575 and 11,383 individuals for RBB, respectively. Eight ML algorithms such as Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, Balanced Bagging, LightGBM, XGBoost, and CatBoost were trained on variance-inflation-filtered climatic and temporal predictors. Among these, CatBoost achieved the highest predictive performance for WSB at the 94.3rd percentile (accuracy = 0.932, F1 = 0.545, ROC–AUC = 0.957), while Logistic Regression performed best for RBB at the 75.1st percentile (F1 = 0.520, ROC–AUC = 0.716). SHAP (SHapley Additive exPlanations) analysis revealed that outbreak probability increases under warm nighttime temperatures, high surface soil moisture, moderate humidity, and calm wind conditions, with lunar phase exerting additional modulation of nocturnal pest activity. The integrated fractal–ML approach thus provides a statistically defensible and ecologically interpretable basis for adaptive pest surveillance. It offers an early-warning system that supports data-driven integrated pest management (IPM), reduces unnecessary pesticide use, and strengthens climate resilience in Philippine rice ecosystems. Full article
(This article belongs to the Special Issue Advanced Agricultural Economy: Challenges and Opportunities)
Show Figures

Figure 1

29 pages, 8289 KB  
Article
Clustering as a Prerequisite for Reliable Machine Learning Prediction of Multi-Odor Systems in Wastewater Treatment
by Su-chul Yoon, Chae-ho Kim and Dong-chul Shin
Atmosphere 2026, 17(1), 18; https://doi.org/10.3390/atmos17010018 - 23 Dec 2025
Viewed by 40
Abstract
Complex odor emissions from wastewater treatment plants consist of multiple volatile compounds that exhibit heterogeneous temporal dynamics and low linear correlations, making accurate prediction and interpretation difficult when analyzed on a single-compound basis. This study investigates whether clustering can serve not only as [...] Read more.
Complex odor emissions from wastewater treatment plants consist of multiple volatile compounds that exhibit heterogeneous temporal dynamics and low linear correlations, making accurate prediction and interpretation difficult when analyzed on a single-compound basis. This study investigates whether clustering can serve not only as an exploratory tool but as an essential preprocessing step to enhance machine-learning performance in multi-odor prediction systems. A total of 22 designated odorants were continuously monitored, and their pairwise dependencies were evaluated using Pearson correlation and mutual information. Data-driven clustering was performed through K-means, hierarchical linkage, and principal-component–based latent grouping, and the resulting structures were quantitatively compared with functional-group-based chemical classifications using the consistency ratio and Jaccard similarity index. Cluster validity was further examined using the Silhouette Coefficient, Davies–Bouldin Index, and Calinski–Harabasz Index. The predictive contribution of clustering was verified by training XGBoost regression models on both raw and cluster-structured datasets. The clustered dataset yielded higher predictive accuracy, with increased R2 and reduced MAE and RMSE across most odorants. SHAP analysis further confirmed that clustering improved model interpretability by stabilizing feature contributions and reducing noise-driven importance shifts. The findings demonstrate that clustering is not a supplementary diagnostic tool, but a prerequisite for building reliable, high-performance machine-learning models in complex odor systems. This integrative framework offers a methodological foundation for multi-odor forecasting, source tracking, and next-generation odor management platforms. Full article
(This article belongs to the Special Issue Environmental Odour (2nd Edition))
Show Figures

Figure 1

28 pages, 3264 KB  
Article
A Unified Fuzzy–Explainable AI Framework (FAS-XAI) for Customer Service Value Prediction and Strategic Decision-Making
by Gabriel Marín Díaz
AI 2026, 7(1), 3; https://doi.org/10.3390/ai7010003 - 22 Dec 2025
Viewed by 189
Abstract
Real-world decision-making often involves uncertainty, incomplete data, and the need to evaluate alternatives based on both quantitative and qualitative criteria. To address these challenges, this study presents FAS-XAI, a unified methodological framework that integrates fuzzy clustering and explainable artificial intelligence (XAI). FAS-XAI supports [...] Read more.
Real-world decision-making often involves uncertainty, incomplete data, and the need to evaluate alternatives based on both quantitative and qualitative criteria. To address these challenges, this study presents FAS-XAI, a unified methodological framework that integrates fuzzy clustering and explainable artificial intelligence (XAI). FAS-XAI supports interpretable, data-driven decision-making by combining three key components: fuzzy clustering to uncover latent behavioral profiles under ambiguity, supervised prediction models to estimate decision outcomes, and expert-guided interpretation to contextualize results and enhance transparency. The framework ensures both global and local interpretability through SHAP, LIME, and ELI5, placing human reasoning and transparency at the center of intelligent decision systems. To demonstrate its applicability, FAS-XAI is applied to a real-world B2B customer service dataset from a global ERP software distributor. Customer engagement is modeled using the RFID approach (Recency, Frequency, Importance, Duration), with Fuzzy C-Means employed to identify overlapping customer profiles and XGBoost models predicting attrition risk with explainable outputs. This case study illustrates the coherence, interpretability, and operational value of the FAS-XAI methodology in managing customer relationships and supporting strategic decision-making. Finally, the study reflects additional applications across education, physics, and industry, positioning FAS-XAI as a general-purpose, human-centered framework for transparent, explainable, and adaptive decision-making across domains. Full article
Show Figures

Figure 1

39 pages, 2216 KB  
Article
A Dual-Model Framework for Writing Assessment: A Cross-Sectional Interpretive Machine Learning Analysis of Linguistic Features
by Cheng Tang, George Engelhard, Yinying Liu and Jiawei Xiong
Data 2026, 11(1), 2; https://doi.org/10.3390/data11010002 - 21 Dec 2025
Viewed by 90
Abstract
Constructed-response items offer rich evidence of writing proficiency, but the linguistic signals they contain vary with grade level. This study presents a cross-sectional analysis of 5638 English Language Arts essays from Grades 6–12 to identify which linguistic features predict proficiency and to characterize [...] Read more.
Constructed-response items offer rich evidence of writing proficiency, but the linguistic signals they contain vary with grade level. This study presents a cross-sectional analysis of 5638 English Language Arts essays from Grades 6–12 to identify which linguistic features predict proficiency and to characterize how their importance shifts across grade levels. We extracted a suite of lexical, syntactic, and semantic-cohesion features, and evaluated their predictive power using an interpretive dual-model framework combining LASSO and XGBoost algorithms. Feature importance was assessed through LASSO coefficients, XGBoost Gain scores, and SHAP values, and interpreted by isolating both consensus and divergences of the three metrics. Results show moderate, generalizable predictive signals in Grades 6–8, but no generalizable predictive power was found in the Grades 9–12 cohort. Across the middle grades, three findings achieved strong consensus. Essay length, syntactic density, and global semantic organization served as strong predictors of writing proficiency. Lexical diversity emerged as a key divergent feature, it was a top predictor for XGBoost but ignored by LASSO, suggesting its contribution depends on interactions with other features. These findings inform actionable, grade-sensitive feedback, highlighting stable, diagnostic targets for middle school while cautioning that discourse-level features are necessary to model high-school writing. Full article
Show Figures

Figure 1

Back to TopTop