Saved Queries

In recent years, with the significant decline in fine particulate matter (PM_2.5) concentrations, ozone (O₃) has emerged as a major composite air pollutant during the warm season in China, attracting increasing attention due to its associated health burden and economic costs. This study focuses on Tianjin, using ozone monitoring data from 2017 to 2023 combined with health statistics to assess the health impacts and economic losses attributable to ozone pollution. First, ozone exposure indicators and compliance criteria were constructed based on national air quality standards, and the interannual variation and spatial differences of O₃ levels were analyzed at both citywide and district scales. Second, multiple machine learning classification models, including logistic regression, decision tree, k-nearest neighbors, and gradient boosting, were developed using ozone and meteorological variables to predict the occurrence risks of five diseases: cardiovascular diseases, respiratory diseases, hand-foot-and-mouth disease (HFMD), influenza, and dengue fever. Finally, excess cases were estimated using health impact functions, and the associated economic losses were quantified by combining the value of a statistical life (VSL) with cost-of-illness and willingness-to-pay (WTP) approaches. The results showed that the annual evaluation value of ozone in Tianjin, defined as the 90th percentile of the daily maximum 8 h average O₃ concentration, exhibited a pattern of initially increasing, then decreasing, and subsequently rebounding. It peaked at 201 µg/m³ in 2018, declined to a minimum of 164 µg/m³ in 2021, and rebounded to 188 µg/m³ in 2023. Machine-learning results indicated that the logistic regression model showed relatively stable overall performance across predictions of different diseases, while the gradient boosting tree model also achieved high accuracy in predicting certain infectious diseases. Overall, ozone pollution exhibits significant heterogeneous effects across different disease types, and the associated health-related economic losses show stage-wise fluctuations in response to pollution levels. Based on these findings, it is recommended to implement refined control measures during periods of high ozone exceedance and in key regions, while strengthening protection for vulnerable populations such as the elderly, children, and patients with respiratory diseases, in order to achieve synergistic improvements in air quality management and public health outcomes. Full article

(This article belongs to the Special Issue Air Quality and Its Impacts on Public Health)

►▼ Show Figures

Figure 1

22 pages, 3297 KB

Open AccessArticle

Explainable Artificial Intelligence (XAI) for Identifying the Integration of International Students in the Host Country and Its Culture

by James Vakilian, Fareed Ud Din, Edmund J. Sadgrove, Mohammadreza Haghighat and Niusha Shafiabady

AI 2026, 7(7), 238; https://doi.org/10.3390/ai7070238 (registering DOI) - 25 Jun 2026

Abstract

The integration of international students into host countries and their cultures is a multifaceted challenge that significantly impacts their academic success and well-being. This study leverages Explainable Artificial Intelligence (XAI) to model and interpret variables associated with the self-rated integration of 175 international students at Charles Darwin University (CDU) in Australia, using data from a 42-question survey. Employing machine learning models, including Decision Tree (DT) and Gradient Boosting Machine (GBM), we use XAI techniques to identify variables most strongly associated with students’ self-rated integration, including career confidence, perceived future happiness, and perceived career obstacles. SHAP analyses and partial dependence plots provide global and instance-level insights, revealing both the magnitude and directional effects of these features. The findings highlight the predictive relevance of psychological and social variables in students’ self-rated integration, offering exploratory insights that inform targeted support programs. By enhancing model transparency through XAI, this research fosters trust in AI-driven educational interventions, addressing ethical considerations and promoting equitable outcomes for diverse student populations. Full article

(This article belongs to the Topic Explainable AI in Education)

23 pages, 7216 KB

Open AccessArticle

A ChiMerge–WOE Ensemble Learning Framework for Landslide Susceptibility Assessment in Jiuzhaigou County, China

by Yujie Liu, Lili Zhang, Yaowen Zhang, Yunsheng Yao and Zhicheng Bao

Sustainability 2026, 18(13), 6488; https://doi.org/10.3390/su18136488 (registering DOI) - 25 Jun 2026

Abstract

Landslide susceptibility assessment is important for disaster prevention and sustainable land-use planning in mountainous regions. However, conventional discretization methods often overlook threshold effects in conditioning factors, and many machine learning models still have limited interpretability. This study develops an integrated framework that combines ChiMerge discretization, Weight of Evidence (WOE) transformation, and tree-based ensemble learning to map landslide susceptibility in Jiuzhaigou County, Sichuan Province, China. A landslide inventory of 164 points was compiled from field investigations and hazard records, and fourteen topographic, geological, and environmental conditioning factors were derived from multi-source spatial datasets. Continuous factors were discretized using ChiMerge, a supervised chi-square-based discretization method that identifies statistically meaningful thresholds according to the distributions of landslide and non-landslide samples. WOE values were then calculated to quantify the association between each factor class and landslide occurrence. Three WOE-based ensemble models, WOE-CatBoost, WOE-LightGBM, and WOE-RF, were constructed and compared. All models showed high predictive performance (AUC > 0.90), with WOE-CatBoost performing best (AUC = 0.9432). Its high and very high susceptibility zones covered 28.59% of the study area but contained 85.96% of observed landslides. High-risk areas were mainly concentrated in steep valleys, fractured lithological zones, erosion belts, and areas affected by engineering activities, such as road construction, slope cutting, tourism infrastructure development, and settlement expansion. The proposed framework improves prediction accuracy and interpretability and provides spatial support for landslide prevention and sustainable land-use management. Full article

(This article belongs to the Special Issue Spatial Analysis and GIS for Sustainable Land Change Management)

25 pages, 1879 KB

Open AccessArticle

Research on Multi-Granularity Collaborative Configuration of Flight Slot Coordination Parameters for Delay Mitigation

by Jiangting Yu, Minghua Hu, Bing Jiang, Lei Yang and Zheng Zhao

Aerospace 2026, 13(7), 569; https://doi.org/10.3390/aerospace13070569 (registering DOI) - 24 Jun 2026

Abstract

The efficiency of airport resource allocation is improved through the establishment of a scientific multi-granularity configuration scheme for flight slot coordination parameters. In this study, a collaborative configuration method for hourly and 15 min coordination parameters is proposed, with Beijing Capital International Airport serving as a case study. Short-term traffic clusters are frequently omitted by traditional hourly parameters, thereby leading to sudden delay surges. First, local delays were extracted from March 2024 Automatic Dependent Surveillance-Broadcast (ADS-B) trajectory data. Subsequently, a delay prediction model was constructed through the integration of a non-stationary queuing model and a gradient boosting regression tree. Second, simulated timetables were generated via a Monte Carlo method under various parameter combinations. With a constant daily flight volume utilized as the experimental baseline, a mapping relationship was established between parameter combinations and expected local delays. Finally, feasible delay regions were delineated and interpretable configuration rules were extracted via a decision tree to maximize schedule flexibility. It was indicated by the results that at an hourly parameter of 70 flights, the target delay is maintained below 8 min by tightening the 15 min parameter to 19 flights. The findings suggest that average load is controlled by hourly parameters, while traffic clustering in high-load scenarios is effectively suppressed by 15 min parameters. A quantitative reference is provided by this method for the configuration of multi-granularity time parameters at hub airports. Full article

(This article belongs to the Special Issue Emerging Trends in Air Traffic Flow and Airport Operations Control)

29 pages, 1861 KB

Open AccessArticle

Physics-Supported Linear and Nonlinear Dimensionality Reduction for Supervised Adaptive Channel Selection in Hybrid RF-FSO-THz Communication Systems

by Luis Miguel Pires and Vitor Fialho

Electronics 2026, 15(13), 2778; https://doi.org/10.3390/electronics15132778 (registering DOI) - 24 Jun 2026

Abstract

Hybrid RF-FSO-THz communication systems are promising candidates for future Internet of Things (IoT) and 6G networks because they combine the robustness of radio frequency links, the high-capacity potential of Free-Space Optical communications, and the ultra-wideband capabilities of terahertz transmission. Adaptive channel selection in such systems depends on multiple correlated environmental and physical-layer variables, including distance, rain intensity, humidity, visibility, turbulence strength, signal-to-noise ratio, channel capacity, and energy-efficiency metrics. This paper presents a physics-supported benchmark framework for supervised adaptive channel selection in hybrid RF-FSO-THz systems and systematically investigates the impact of linear and nonlinear dimensionality-reduction techniques on predictive performance, statistical robustness, computational complexity, and physical interpretability. A multi-scenario dataset comprising 5000 samples was generated using calibrated RF, FSO, and THz propagation models under clear, rain, fog, and worst-case environmental conditions. Principal Component Analysis (PCA) and Kernel PCA were evaluated together with Random Forest, Support Vector Machines (SVMs), XGBoost, Gradient Boosting (GB), Multi-Layer Perceptron (MLP), Logistic Regression, and Decision Trees. The results demonstrate that PCA preserves nearly all predictive capabilities while reducing the original 33-dimensional feature space by approximately 81.8%, maintaining accuracies close to 97–98% with the best-performing classifiers. Statistical significance analysis confirms that PCA introduces only modest degradations, whereas Kernel PCA consistently reduces the predictive performance while increasing memory requirements and inference latency. Additional environmental-only validation experiments indicate that adaptive channel selection remains highly learnable even when only pre-selection environmental descriptors are available, partially mitigating concerns regarding self-consistency bias. Overall, the results suggest that PCA provides an advantageous compromise among predictive accuracy, computational efficiency, statistical robustness, and physical interpretability for supervised adaptive channel selection in physics-supported hybrid wireless communication systems. Full article

(This article belongs to the Special Issue Feature Papers in 'Microwave and Wireless Communications' Section, 2nd Edition)

34 pages, 22602 KB

Open AccessArticle

Toward Predicting Slope Stability Hazard Levels Using Ensemble Learning

by Yulin Zou, Shahab Hosseini, Mohammad Afrazi, Seyed Yaser Mousavi Siamakani, Pijush Samui and Danial Jahed Armaghani

CivilEng 2026, 7(3), 39; https://doi.org/10.3390/civileng7030039 (registering DOI) - 24 Jun 2026

Abstract

The present study investigates the application of conventional and ensemble machine learning models for slope stability prediction, which is essential for landslide risk reduction and sustainable infrastructure management. A database containing 627 slope cases was used, including six input variables: unit weight, cohesion, friction angle, slope angle, slope height, and pore pressure ratio. Six machine learning models, namely Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), Classification and Regression Tree (CART), and Boosted Tree, were developed and evaluated. The models were assessed using ROC analysis, confusion-matrix-derived metrics, precision–recall analysis, feature importance assessment, and unseen testing cases. The results showed that ensemble-based models provided superior predictive performance compared with conventional machine learning models. Based on ROC analysis, RF achieved the highest ROC-AUC value of 0.93, followed by Boosted Tree and XGBoost with ROC-AUC values of 0.92 and 0.90, respectively. Based on confusion-matrix-derived metrics, Boosted Tree achieved the highest accuracy of 0.862 and F1-score of 0.874, while RF showed comparable performance with an accuracy of 0.857 and F1-score of 0.868. Feature importance analysis indicated that cohesion and unit weight were among the most influential variables affecting slope stability prediction. In addition, the unseen testing cases confirmed the practical generalization capability of the ensemble models, with Boosted Tree and RF achieving accuracies of 0.920 and 0.880, respectively. Overall, the findings demonstrate that ensemble learning models, particularly Boosted Tree and RF, can provide reliable and interpretable decision-support tools for preliminary slope stability assessment and landslide hazard management. Full article

(This article belongs to the Section Geotechnical, Geological and Environmental Engineering)

►▼ Show Figures

Figure 1

25 pages, 4672 KB

Open AccessArticle

Data-Efficient and Explainable Multimodal Survival Prediction in NSCLC Using Deep Image Embeddings, Clinical Variables, and Gradient-Boosted Trees

by Sevim Sahin and Adil Gursel Karacor

Diagnostics 2026, 16(12), 1941; https://doi.org/10.3390/diagnostics16121941 (registering DOI) - 22 Jun 2026

Viewed by 176

Abstract

Background/Objectives: Survival prediction in non-small cell lung cancer (NSCLC) remains challenging, particularly in limited-sample settings where end-to-end deep learning models may suffer from limited generalization. This study aimed to develop a data-efficient, multimodal, and explainable framework integrating computed tomography (CT)-derived imaging information with clinical variables for NSCLC survival prediction. Methods: CT images, tumor segmentations, and clinical data from the publicly available NSCLC Radiomics (LUNG1) dataset (377 patients) were used. Tumor-focused regions were extracted using segmentation masks, and pretrained RadImageNet-InceptionV3 embeddings were obtained from the largest tumor-containing slice and neighboring-slice summaries. Deep imaging embeddings, engineered imaging features, and clinical variables were fused into a unified tabular representation. To improve robustness under limited-sample conditions, feature blocks were compressed using principal component analysis. CatBoost, XGBoost, and LightGBM models were trained on a development set and evaluated on a strictly held-out final validation set. Results: In three-class survival stratification, assigning censored/non-event patients to the upper survival group produced the strongest ordinal prognostic performance. Under the EX_PLUS_NON_EX_TOP setting, CatBoost achieved the best holdout score-based class C-index of 0.655. In continuous survival regression, LightGBM achieved the best holdout event-patient C-index of 0.576. Clinical variables provided the dominant prognostic signal, while compact deep image embeddings contributed complementary information, particularly in separating short- and long-survival groups. SHAP analysis confirmed contributions from both clinical and image-derived features. Conclusions: The proposed framework provides a proof-of-concept demonstration of a data-efficient and explainable image-to-tabular approach for NSCLC survival prediction under strict internal holdout validation. The results suggest that pretrained CT embeddings, clinical variables, gradient-boosted trees, and SHAP-based interpretation can be combined in a feasible, limited-sample survival modeling pipeline, while external validation remains necessary before clinical translation. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

►▼ Show Figures

Figure 1

34 pages, 12697 KB

Open AccessArticle

Hybrid Machine Learning Models for Predicting Gross CO₂e Balance in Polish Forest Stands: A Tool for Sustainable Forest Carbon Assessment in the Circular Economy

by Krzysztof Przybył, Agnieszka A. Pilarska and Krzysztof Pilarski

Sustainability 2026, 18(12), 6366; https://doi.org/10.3390/su18126366 (registering DOI) - 22 Jun 2026

Viewed by 282

Abstract

Forest carbon assessment requires methods that capture the combined effects of stand structure, site conditions, carbon pools, operational emissions, and circular-economy processes. This study aimed to develop and optimize hybrid machine learning models for predicting the gross CO₂e (carbon dioxide equivalent) balance of Polish forest stands using measurable stand- and site-related variables. The research was based on a primary dataset describing forest management in major Polish macroregions in 2020–2024. After data cleaning and preprocessing, multiple machine learning algorithms, including ensemble, boosting, neural, and hybrid models, were trained, validated, and tested. Model performance was assessed using standard regression metrics, overfitting diagnostics, learning curves, and SHAP (Shapley Additive Explanations). Most models achieved high predictive accuracy, with six of ten algorithms reaching R² values above 0.90 on the test set. The reduction in strongly correlated variables helped limit multicollinearity and excessive overlap between predictors and the target variable, supporting a more reliable interpretation of model performance. The CatBoost algorithm achieved the highest predictive performance on the test set (R² = 0.948), while also recording the lowest root mean squared error (RMSE = 152.242). However, the Decision Tree demonstrated the weakest generalization performance (R² = 0.806) on the test set. SHAP analysis identified tree height as the most influential predictor, followed by tree age, number of trees, species composition, and selected habitat features. The novelty of the study lies in integrating hybrid machine learning, interpretable modelling, and circular-economy-related carbon balance components into a single framework for rapid and operational forest carbon assessment in Polish forest stands. Full article

(This article belongs to the Special Issue Sustainable Forest Technology and Resource Management)

►▼ Show Figures

Figure 1

22 pages, 6227 KB

Open AccessArticle

Multi-Source Meteorological–Topographic Modeling of Monthly Power Generation for Mountain Photovoltaic Stations Using Gradient-Boosted Trees

by Pengjie Sun, Ming Wang, Dan Meng, Yang Xu, Chi Cheng and Wei Ju

Energies 2026, 19(12), 2936; https://doi.org/10.3390/en19122936 (registering DOI) - 22 Jun 2026

Viewed by 216

Abstract

Mountain photovoltaic (PV) stations are increasingly deployed in complex terrain, where generation is jointly controlled by solar-resource variability, near-surface meteorology, and local topography. However, the quantitative contribution of topographic factors to regional-scale PV generation remains insufficiently evaluated, and many prediction studies rely on single-station or short-term records. In this study, monthly measured generation from 118 standardized village-level mountain PV stations in Badong County, western Hubei Province, China (2019–2021), was integrated with Solargis Global Horizontal Irradiance (GHI)-related solar-resource data, high-resolution gridded meteorological data, a 25 m digital elevation model, seasonal-cycle variables, and historical-generation features. After seasonally grouped median-absolute-deviation (MAD) outlier screening, GIS-based spatial matching, terrain extraction, and viewshed-derived shading analysis, regression models and climatology baselines were compared under both chronological validation and station-exclusion spatial cross-validation. Under the strict chronological validation, CatBoost achieved the best temporal performance among the tested models (R² = 0.3119, MAE = 2719.7 kWh, RMSE = 3245.6 kWh), slightly outperforming the monthly climatology baseline. In the station-exclusion spatial cross-validation, XGBoost achieved the highest mean R² (0.8659), indicating good spatial transferability to unseen stations. Correlation and partial-correlation analyses showed that the temperature-related variable group and monthly radiation were the dominant meteorological controls, whereas elevation, slope, and terrain shading showed weak direct correlations with monthly generation for already-sited stations. Annual 90% prediction intervals were further estimated using residual bootstrapping, with an empirical coverage of 94.9%. The proposed framework provides a practical basis for monthly generation forecasting and operational assessment of already-built distributed PV stations in mountainous regions, while its application to greenfield site selection requires additional site engineering and near-field obstruction information. Full article

►▼ Show Figures

Figure 1

18 pages, 8978 KB

Open AccessArticle

Dynamical Precursors and Temporal Persistence of Environmental Forcing in Wave Overtopping at a Field-Scale Breakwater

by Khawar Rehman, Wan Hee Cho, Hwa-Young Lee, Gwang-Ho Seo and Jong Yoon Mun

J. Mar. Sci. Eng. 2026, 14(12), 1130; https://doi.org/10.3390/jmse14121130 - 19 Jun 2026

Viewed by 185

Abstract

Wave overtopping is one of the most complex coastal hazards to characterize in field conditions due to its high non-linearity and the interaction between unsteady hydrodynamics and wave–structure processes. To get insights into the underlying occurrence and persistence of overtopping, this study proposes an integration of numerical and data-driven models. Multi-month field observations made at a breakwater are used to investigate the hydro-meteorological parameters causing overtopping initiation and persistence. High-frequency video-derived overtopping detections are combined with coupled ADCIRC–UnSWAN (ADvanced CIRCulation–Unstructured Simulating WAves Nearshore) hindcasts to construct near-structure hydro-meteorological conditions. The results reveal a clear dynamical asymmetry showing that overtopping initiation corresponds to exceedance of crest elevation at individual wave-scale associated with elevated wave height, water level, wave steepness, and wind characteristics, whereas overtopping persistence depends on short-term temporal effects associated with wave energy, direction, and sustained water levels. Gradient-boosted decision trees, temporal convolutional networks, and Transformer models are employed, demonstrating that persistence cannot be inferred from instantaneous sea-states alone, indicating a separation of timescales between triggering and sustained overtopping dynamics. These findings provide field-scale evidence of distinct hydrodynamic regimes governing overtopping processes, highlighting the importance of temporal characteristics for understanding overtopping dynamics and developing predictive coastal hazard frameworks. Full article

(This article belongs to the Section Coastal Engineering)

►▼ Show Figures

Figure 1

21 pages, 11433 KB

Open AccessArticle

Machine Learning-Assisted Synthesis of Self-Organizing SISO Control Systems with Guaranteed Lyapunov Stability

by Nurgul Shazhdekeyeva, Beket Kenzhegulov, Kamka Uteuliyeva, Gulash Kochshanova, Gulmira Nigmetova, Lyailya Kurmangaziyeva, Raigul Tuleuova, Saya Kenzhegulova and Raushan Moldasheva

Computation 2026, 14(6), 142; https://doi.org/10.3390/computation14060142 - 19 Jun 2026

Viewed by 152

Abstract

The proposed methodology combines analytical control laws with adaptive mechanisms and machine-learning-assisted modules based on regression trees, random forests, and extreme gradient boosting (XGBoost). Machine learning models are employed to approximate unknown nonlinear dynamics, compensate disturbances, and adjust controller parameters, while the overall control structure is constrained by Lyapunov stability conditions. This ensures that the inclusion of data-driven components does not violate the fundamental requirement of system stability. The effectiveness of the proposed approach is evaluated through simulation experiments across three operating modes with varying degrees of nonlinearity and dynamic complexity. The results show that hybrid models incorporating ensemble machine learning methods improved performance compared with the analytical and adaptive baselines examined. XGBoost-based control achieves the lowest error values and the highest level of Lyapunov stability compliance (up to 99.3%). The main contribution of this study lies in the development of a unified synthesis framework in which machine learning is not used as a standalone control strategy but as a machine-learning-assisted support mechanism integrated into a theoretically grounded control architecture. The proposed approach provides a balance between adaptability, accuracy, and rigorous stability guarantees, suggesting potential applicability to simulation-based and offline-assisted control design tasks, while real-time embedded implementation requires additional computational optimization and validation. Full article

(This article belongs to the Section Computational Engineering)

►▼ Show Figures

Figure 1

16 pages, 4612 KB

Open AccessArticle

Discovery-Driven Plasma Proteomics Identifies a Multi-Protein Signature for Amyloid PET Positivity: A Machine Learning Analysis of the Bio-Hermes Cohort

by Stelios Lamprou, Kalliopi Mavromati, Frank J. Gunn-Moore and Terry J. Quinn

Int. J. Mol. Sci. 2026, 27(12), 5533; https://doi.org/10.3390/ijms27125533 (registering DOI) - 18 Jun 2026

Viewed by 205

Abstract

Alzheimer’s disease is a progressive neurodegenerative disorder in which early detection remains limited by the cost and invasiveness of positron emission tomography and cerebrospinal fluid testing. We evaluated whether plasma proteomic profiles could distinguish amyloid PET-positive from amyloid PET-negative individuals using the Bio-Hermes cohort. After quality control and missing-data filtering, 988 participants and 295 proteins were analysed; 31 proteins showing group differences were used for supervised classification. Random Forest, Gradient Boosting, and Neural Network models were trained across four train/test splits with repeated cross-validation and class downsampling. Amyloid-positive and amyloid-negative groups differed across a subset of proteins, with five remaining significant after false discovery rate correction. Tree-based models performed most consistently, with Random Forest and Gradient Boosting achieving AUC values of 0.79–0.81 and balanced accuracy of 0.68–0.73. Eight proteins (SERPINA1, C3, CRP, APOE4, CFH, VTN, C1QTNF5, and PON1) emerged as recurring high-importance features. These findings indicate that discovery-driven plasma proteomics can identify multi-protein signatures associated with amyloid status and can complement established single-analyte blood biomarkers by adding pathway-level information. Full article

(This article belongs to the Special Issue AI, ML and Bioinformatics in Molecular Mechanisms of Human Health and Disease)

►▼ Show Figures

Figure 1

17 pages, 2589 KB

Open AccessArticle

Prediction and Interpretation of the Volumetric Mass Transfer Coefficient in Bioreactors Using a No-Code Platform for Autonomous Machine Learning Model Selection

by Ho-Yeon Lee, Yonghee Shin, Jongsun Won, Jin Ho Lee, Sangmin Park, Sang-Min Paik, Hwa Sung Shin, Moo Sun Hong and Jun-Woo Kim

Processes 2026, 14(12), 1982; https://doi.org/10.3390/pr14121982 - 18 Jun 2026

Viewed by 284

Abstract

The volumetric mass transfer coefficient (

k_{L} a

The volumetric mass transfer coefficient (

k_{L} a

) governs the design, operation, and scale-up of aerobic bioprocesses, yet its dependence on reactor geometry, impeller design, operating conditions, and fluid properties limits prediction by empirical correlations. Machine learning (ML) improves accuracy but faces two barriers in bioprocess practice: selecting the best model among many candidates requires expertise, and small, highly multicollinear data make models chosen based on test error alone prone to overfitting. Using a browser-based, no-code platform, we trained 14 regression algorithms under an identical pipeline on a published

k_{L} a

dataset, and introduced a composite objective, the generalization-penalized error (GPE), which is the test RMSE plus the absolute train–test RMSE gap. Minimizing GPE rather than test RMSE expanded the top statistically equivalent group to include not only boosting ensembles but also simpler, interpretable models, indicating that black-box models hold no clear advantage once train–test consistency is assessed. Sensitivity analysis showed that tree models produce discontinuous responses, whereas algebraic learning via elastic net (ALVEN) yields smooth surfaces. Shapley additive explanations (SHAP) and an ontology graph, interpreted by a retrieval-augmented language-model agent, identified rotational speed and gas flow rate as dominant, reproducing the established mass transfer mechanism. The framework offers a reproducible, interpretable, expertise-light route to bioprocess model selection. Full article

(This article belongs to the Special Issue Process Modeling and Optimization in Bioproducts Manufacturing)

►▼ Show Figures

Figure 1

23 pages, 2980 KB

Open AccessArticle

Grouped Feature Representation and Gated Multilayer Perceptron for Event-Level Football Pass Outcome Prediction

by Yijuan Yuan, Shaosong Wang, Yonghong Deng and Zhibin Li

Entropy 2026, 28(6), 703; https://doi.org/10.3390/e28060703 - 17 Jun 2026

Viewed by 234

Abstract

Accurate prediction of football pass outcomes is important for tactical analysis, decision evaluation, and skill-oriented feedback in student football training and physical education. However, event-level pass outcome prediction remains challenging because pass success is jointly influenced by spatial context, defensive pressure, receiver-related cues, and historical coordination between players. To address this issue, this study proposes an information-guided multilayer perceptron (IGMLP) based on grouped feature representation and gated feature fusion using structured event data. In the proposed framework, input variables are organized into interpretable semantic feature groups, including contextual features, pressure-aware features, historical coordination features, and receiver-related features. These groups are encoded through separate branches and adaptively fused by a group-level gating mechanism for nonlinear pass outcome modeling. Unlike conventional gated neural architectures that usually apply generic gates to hidden units, channels, or sequential states, the proposed gated design operates at the semantic feature-group level and adaptively weights football-specific information sources according to their relevance to each pass event. Using the StatsBomb open-event dataset, both prediction and recognition paths were constructed, and the proposed model was compared with standard multilayer perceptron (MLP), residual neural network (ResNet), boosting tree (BT), convolutional neural network (CNN), and long short-term memory network (LSTM). In the prediction path, IGMLP achieved an Accuracy of 0.9184, Precision of 0.9295, Recall of 0.9837, F1-score of 0.9558, and AUC of 0.9325. In the recognition path, IGMLP achieved an Accuracy of 0.9808, Precision of 0.9882, Recall of 0.9902, F1-score of 0.9893, and AUC of 0.9925. These results indicate that semantic feature grouping and gated feature fusion are effective for event-level football pass outcome prediction. Full article

(This article belongs to the Section Signal and Data Analysis)

►▼ Show Figures

Figure 1

28 pages, 25973 KB

Open AccessArticle

Forecasting and Enhancing Weight on Bit Through Machine Learning Methods in the Sudanese Oil and Gas Sector

by Asaad Mustafa, Guojun Wen, AL-Wesabi Ibrahim, Wahib Yahya and Abobaker Albabo

Appl. Sci. 2026, 16(12), 6149; https://doi.org/10.3390/app16126149 - 17 Jun 2026

Viewed by 127

Abstract

Drilling optimization seeks to enhance the efficiency of drilling operations by fine-tuning adjustable factors like weight on bit (WOB); the goal is to boost the rate of penetration during drilling and decrease overall well expenses. It is crucial to efficiently and precisely manage weight on bit (WOB) to fine-tune drilling parameters promptly. Drilling optimization focuses on adjusting controllable variables, such as weight on the bit and bit rotation speed, to achieve the highest possible drilling rate during operations. Consequently, it is necessary to conduct a comparative analysis of ML models to evaluate practitioners in picking the appropriate predictive model. This research employs four machine learning methods to forecast weight on bit: Random Forest (RF), K-Nearest Neighbors (KNNs), Gradient Boosting Regression (GBR), and Decision Tree (DT). Machine learning techniques are being evaluated using datasets sourced from well drilling data in Western Sudan, marking the first instance of such data being utilized for this purpose. The key accomplishment of this study is the automation of predicting weight on bit by utilizing machine learning techniques tailored to our datasets. The findings indicated that among the algorithms tested, Random Forest stood out as the most dependable, displaying a prediction accuracy of 98% and a lower RMSE value of 1.015. In contrast, KNN, GBR, and DT achieved accuracies of 91.40%, 80.66%, and 100.00% respectively, with RMSE values of 2.008, 3.011, and 6.27 on the testing dataset, correspondingly. At last, this research is acknowledged as a groundbreaking effort in the field, utilizing machine learning techniques to predict weight on bit occurrences. Consequently, this study presents a publicly available dataset containing details about drilled wells in the Sudanese oil and gas sector. This dataset is meant to be used for upcoming experiments, validating algorithms, and for analytical purposes. Full article

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 62.

Go to page 1 2 3 4 5

Search Results (3,051)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI