MDPI - Publisher of Open Access Journals

23 pages, 2846 KB

Open AccessArticle

Predicting Emergency Department Patient Arrivals at Hospitals Using Machine Learning Techniques

by Abdulmajeed M. Alenezi, Mahmoud Sameh, Meshal Aljohani and Hosam Alharbi

Healthcare 2026, 14(9), 1191; https://doi.org/10.3390/healthcare14091191 - 29 Apr 2026

Background/Objective: Emergency Departments (EDs) face persistent challenges with overcrowding, unpredictable patient arrivals, and difficulty forecasting short-term demand. Precise hourly arrival predictions are crucial for effective staffing, optimal resource management, and minimizing entry delays. Methods: This paper develops and evaluates a forecasting framework comparing [...] Read more.

Background/Objective: Emergency Departments (EDs) face persistent challenges with overcrowding, unpredictable patient arrivals, and difficulty forecasting short-term demand. Precise hourly arrival predictions are crucial for effective staffing, optimal resource management, and minimizing entry delays. Methods: This paper develops and evaluates a forecasting framework comparing six approaches (a Seasonal Naive baseline, Exponential Smoothing (ETS), Ridge Regression, LightGBM, a hybrid Temporal Convolutional Network (TCN), and a hybrid Long Short-Term Memory (LSTM) network) using de-identified hourly patient arrival records from an ED in Madinah, Saudi Arabia, covering January–November 2024. A set of 183 engineered features is constructed from cyclical time encodings, weekend and public-holiday indicators, structured autoregressive lags, and volatility measures, with all lag-based features verified to use strictly retrospective information. Models are optimized using Bayesian hyperparameter search and trained under an asymmetric loss function that penalizes underprediction to reflect operational risk. Results: Results on a 14-day hold-out test set show that Ridge Regression achieves the lowest MAE (3.75, R² = 0.52), with TCN and LSTM essentially tied (MAE 3.80 and 3.85). Diebold–Mariano tests confirm that Ridge, TCN, and LSTM are statistically indistinguishable from one another and that Ridge is marginally significantly better than LightGBM (

p = 0.028

); all four ML models significantly outperform ETS and the Seasonal Naive baseline (

p < 0.001

). On the asymmetric metric, TCN achieves the best AsymRMSE (5.59), reflecting its tendency to err on the safe side of staffing decisions. Robustness is confirmed through sensitivity analysis across penalty factors, feature ablation demonstrating the contribution of each feature group without overfitting, expanding-window cross-validation across three independent monthly test periods, and conformal prediction intervals achieving well-calibrated coverage. Conclusions: These results demonstrate that combining engineered temporal features with either a lightweight linear model or a hybrid sequence model yields accurate hourly ED arrival forecasts; whether the achieved accuracy is operationally sufficient for staffing decisions remains a site-specific question that requires clinical validation beyond the scope of this single-center study. Full article

(This article belongs to the Special Issue AI-Driven Healthcare: Transforming Patient Care and Outcomes)

► Show Figures

Figure 1

18 pages, 1859 KB

Open AccessArticle

Explainable Artificial Intelligence for Coffee Quality Control: From Coffee Origins to Aroma Intensity

by Giorgio Felizzato, Eloisa Bagnulo, Giorgia Botta, Giulia Tapparo, Chiara Cordero, Luciano Navarini, Cecilia Cagliero, Erica Liberto and Andrea Caratti

Foods 2026, 15(9), 1543; https://doi.org/10.3390/foods15091543 - 29 Apr 2026

Abstract

Background: Coffee quality is strongly influenced by origin-related factors, or terroir, which shape chemical composition and sensory characteristics. In the specialty coffee sector, where authenticity, traceability, and flavour distinctiveness drive value, understanding the molecular basis of sensory attributes, particularly perceived intensity, is essential. [...] Read more.

Background: Coffee quality is strongly influenced by origin-related factors, or terroir, which shape chemical composition and sensory characteristics. In the specialty coffee sector, where authenticity, traceability, and flavour distinctiveness drive value, understanding the molecular basis of sensory attributes, particularly perceived intensity, is essential. Methods: This study combined analytical chemistry and explainable artificial intelligence to explore relationships between volatile composition, coffee origin, and sensory intensity. Roasted and ground single-origin coffees from five provenances were analysed using headspace solid-phase microextraction coupled with gas chromatography–mass spectrometry (HS-SPME/GC–MS). A Support Vector Machine (SVM) classifier discriminated coffee origins based on volatile profile, and SHapley Additive exPlanations (SHAP) identified key compounds. Ridge Regression (RR) was applied to predict sensory intensity values assigned by an expert panel. Results: The SVM model classified coffee origins with 91% accuracy, and SHAP analysis highlighted the volatiles most responsible for differentiation. RR predicted sensory intensity with R² = 0.88 and RMSE = 0.38, linking molecular profiles with panel-assigned intensity scores. Conclusions: This approach connects molecular profile with packaging-declared aroma intensity, offering an indirect yet informative link to sensory perception and illustrating the potential of data-driven methods in sensory science. Overall, the proposed explainable AI approach provides a transparent and reproducible connection between chemical composition, sensory traits, and perceived quality. This strategy supports more objective and traceable quality assessment systems, aligning analytical precision with sensory expertise, which is an essential step toward the evolution of quality control in industrial applications. Full article

(This article belongs to the Special Issue Applications of Foodomics Strategies in Advancing Food Quality, Safety and Authenticity)

► Show Figures

Figure 1

26 pages, 2925 KB

Open AccessArticle

Mapping Building-Level Monthly CO₂ Emissions of Different Functions: A Case Study of England

by Youli Zeng, Yue Zheng, Jinpei Ou and Xiaoping Liu

Remote Sens. 2026, 18(9), 1344; https://doi.org/10.3390/rs18091344 - 27 Apr 2026

Viewed by 34

Abstract

Understanding carbon dioxide (CO₂) emissions from buildings is critical for shaping effective policies toward sustainable urban development. Previous studies mainly applied bottom-up methods for small areas or top-down downscaling at national, provincial or grid scales. However, limited research has explored the [...] Read more.

Understanding carbon dioxide (CO₂) emissions from buildings is critical for shaping effective policies toward sustainable urban development. Previous studies mainly applied bottom-up methods for small areas or top-down downscaling at national, provincial or grid scales. However, limited research has explored the relationship between building functions and CO₂ emissions at a larger scale. To bridge this gap, this study employed ridge regression to disaggregate monthly CO₂ emissions to the level of different functional buildings across England in 2022 and investigated the relationship between building functions and CO₂ emissions. Results show that commercial buildings rank highest in CO₂ intensity, reaching 1.49 kg per volume in February, while residential buildings rank lowest, reaching 0.25 kg per volume in July at the national scale, and industrial buildings have the largest total emissions. In addition, regional disparities in economic development and industrial structure contribute to emission differences among buildings of the same function. Temporally, all functional buildings exhibited lower emissions during summer compared to winter. Overall, this study offers a scalable and interpretable framework for understanding urban carbon emissions at high spatial and functional granularity. The findings may offer valuable insights to support government decision-making in urban planning and spatial policy design, thereby contributing to low-carbon development goals. Full article

(This article belongs to the Topic Advances in Low-Carbon, Climate-Resilient, and Sustainable Built Environment)

33 pages, 39404 KB

Open AccessArticle

Multi-Scale Temporal Uncertainty-Aware Hierarchical Adaptive Ensemble for Intelligent Ship Emission Monitoring and Prediction

by Duc-Anh Pham, Kyeong-Ju Kong, Jung-Min Kim, Hee-Sung Yoon and Seung-Hun Han

J. Mar. Sci. Eng. 2026, 14(9), 799; https://doi.org/10.3390/jmse14090799 - 27 Apr 2026

Viewed by 109

Abstract

This paper presents a novel Multi-Scale Temporal Uncertainty-aware Hierarchical Adaptive Ensemble (MSTU-HAE) algorithm for intelligent ship emission monitoring and prediction in maritime environmental compliance applications. The maritime shipping industry contributes approximately 3% of global CO₂ emissions and significant amounts of nitrogen oxides [...] Read more.

This paper presents a novel Multi-Scale Temporal Uncertainty-aware Hierarchical Adaptive Ensemble (MSTU-HAE) algorithm for intelligent ship emission monitoring and prediction in maritime environmental compliance applications. The maritime shipping industry contributes approximately 3% of global CO₂ emissions and significant amounts of nitrogen oxides and sulfur oxides, necessitating advanced predictive monitoring systems. The proposed MSTU-HAE algorithm integrates three key innovations: multi-scale temporal feature extraction using causal convolutions at short-term (5 samples), medium-term (20 samples), and long-term (60 samples) windows; gas-specific attention mechanisms that automatically weight temporal scales based on individual emission gas characteristics; and three-level hierarchical uncertainty quantification encompassing individual model uncertainty, ensemble disagreement, and regulatory compliance risk assessment. Experimental validation was conducted using emission data collected from a fishing vessel over 3 operational days (1732 original samples), augmented to 17,320 samples via controlled replication with noise injection to support model training. Rigorous temporal data splitting with 70%/15%/15% train/validation/test partitioning ensures no data leakage. Comparative analysis against six baseline methods (XGBoost, LSBoost, AdaBoost, Ridge Regression, Random Forest, and K-Nearest Neighbors) demonstrates that MSTU-HAE achieves superior average performance, with R² = 0.9670 and NSE = 0.9670 across all emission gases. This research contributes a robust, interpretable, and scalable prediction framework that advances the state of the art in maritime environmental monitoring through novel algorithmic innovations in temporal feature learning and uncertainty quantification. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

13 pages, 1237 KB

Open AccessArticle

Development of a Medium-Density Genotyping Platform to Accelerate Genetic Gain in Fresh Edible Maize

by Jingtao Qu, Diansi Yu, Wei Gu, Yingjie Zhao, Kai Li, Hui Wang, Pingdong Sun, Felix San Vicente, Xuecai Zhang, Ao Zhang, Hongjian Zheng and Yuan Guan

Plants 2026, 15(9), 1288; https://doi.org/10.3390/plants15091288 - 22 Apr 2026

Viewed by 231

Abstract

Genotyping is a key step in molecular breeding. Due to its cost-effectiveness, accuracy, and flexibility, genotyping by target sequencing (GBTS) has become a preferred technology for medium-density genotyping. In this study, a new GBTS array for fresh edible maize was developed using resequencing [...] Read more.

Genotyping is a key step in molecular breeding. Due to its cost-effectiveness, accuracy, and flexibility, genotyping by target sequencing (GBTS) has become a preferred technology for medium-density genotyping. In this study, a new GBTS array for fresh edible maize was developed using resequencing data from 477 lines. The array contains 5759 SNPs evenly distributed across the maize genome, with average minor allele frequency (MAF) and polymorphism information content (PIC) values of 0.40 and 0.36, respectively. These SNPs are closely associated with 1566 functional genes. Cluster analysis of 198 maize lines based on the GBTS array was consistent with their pedigree relationships. Furthermore, 277 fresh waxy maize lines were genotyped and used for genomic selection analyses of hundred-kernel weight, kernel length, and kernel width. Comparative evaluation of different models indicated that Ridge Regression Best Linear Unbiased Prediction (rrBLUP) was the optimal model, with prediction accuracies of 0.33, 0.64, and 0.36, respectively. Additional analyses using different marker densities based on the rrBLUP model showed that prediction accuracy did not increase when the number of markers exceeded 2000, indicating that this array provides sufficient marker density for genetic analysis and genomic selection. Overall, this array provides a useful tool for genetic studies of fresh edible maize and facilitates the application of genomic selection in breeding programs. Full article

(This article belongs to the Section Plant Genetics, Genomics and Biotechnology)

► Show Figures

Figure 1

31 pages, 2222 KB

Open AccessFeature PaperArticle

Parity Regression Estimation

by Vali Asimit, Ziwei Chen, Bogdan Ichim and Pietro Millossovich

Risks 2026, 14(4), 94; https://doi.org/10.3390/risks14040094 - 21 Apr 2026

Viewed by 139

Abstract

Multiple linear regression remains a foundational predictive methodology across a broad range of applications. We propose a novel regression framework that, rather than minimising the aggregate prediction error associated with the dependent variable, explicitly distributes the risk evenly across all model parameters. This [...] Read more.

Multiple linear regression remains a foundational predictive methodology across a broad range of applications. We propose a novel regression framework that, rather than minimising the aggregate prediction error associated with the dependent variable, explicitly distributes the risk evenly across all model parameters. This approach provides a structural safeguard that is particularly suitable for data affected by substantial noise, as is often the case in time series environments characterised by regime shifts, structural breaks, and evolving trends. We provide a theoretical characterisation of our proposed estimator, named Parity Regression, and benchmark its analytical properties against existing penalised and shrinkage estimators in the literature. Both synthetic experiments and empirical applications demonstrate that the theoretical guarantees of the proposed method translate into enhanced out-of-sample forecasting stability in practice. Full article

► Show Figures

Figure 1

19 pages, 4385 KB

Open AccessArticle

Impact of Climate Warming on Cropland Water Use Efficiency in Northeast China Based on BESS Satellite Data

by Fenfen Guo, Haoran Wu, Zhan Su, Yanan Chen, Jiaoyue Wang and Xuguang Tang

Remote Sens. 2026, 18(8), 1223; https://doi.org/10.3390/rs18081223 - 17 Apr 2026

Viewed by 420

Abstract

Understanding the long-term dynamics of cropland water use efficiency (WUE) and its underlying environmental drivers is essential for ensuring food and water security, particularly for regions facing intensified climate change. Here, we investigated the spatial patterns and long-term trends of gross primary productivity [...] Read more.

Understanding the long-term dynamics of cropland water use efficiency (WUE) and its underlying environmental drivers is essential for ensuring food and water security, particularly for regions facing intensified climate change. Here, we investigated the spatial patterns and long-term trends of gross primary productivity (GPP), evapotranspiration (ET), and WUE in cropland ecosystems across Northeast China during the past two decades as the nation’s primary commodity grain base using the time-series Breathing Earth System Simulator (BESS) products. Subsequently, the ridge regression method was used to quantitatively disentangle the relative contributions of key climatic variables to the observed WUE trends of cropland. Our results revealed a pronounced decreasing gradient in both GPP and ET along the southeast–northwest direction. A significant increase in GPP was observed over the 20-year period (p < 0.01), with 95.94% of the cropland area showing positive trends. ET showed a slight, non-significant increase (p > 0.05), though 82.77% of pixels exhibited positive trends, particularly in the northwest. Consequently, WUE showed a widespread and significant enhancement (p < 0.01), with approximately 98% of cropland pixels exhibiting increasing trends. Attribution analysis identified air temperature as the dominant environmental variable, accounting for 92.4% of the observed WUE increase, while solar radiation and precipitation contributed modestly (3.4% and 3.2%, respectively). Our findings underscore the predominant role of thermal conditions in shaping the carbon–water coupling efficiency of agroecosystems in semi-arid to semi-humid transition zones. This study provides quantitative evidence that warming climate, rather than changes in water availability or radiation, has been the primary climatic factor driving the improved cropland WUE over the past two decades. These insights have important implications for developing adaptive water management strategies to enhance agricultural climate resilience in Northeast China and similar regions worldwide. Full article

(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)

► Show Figures

Figure 1

18 pages, 3217 KB

Open AccessArticle

Machine Learning-Based Prediction of Multi-Year Cumulative Atmospheric Corrosion Loss in Low-Alloy Steels with SHAP Analysis

by Saurabh Tiwari, Seong Jun Heo and Nokeun Park

Coatings 2026, 16(4), 488; https://doi.org/10.3390/coatings16040488 - 17 Apr 2026

Viewed by 218

Abstract

Atmospheric corrosion of carbon and low-alloy steels causes direct economic losses that are estimated at around 3.4% of the global GDP, and its accurate multi-year prediction is essential for protective coating selection, service-life estimation, and infrastructure maintenance scheduling. In this study, machine learning [...] Read more.

Atmospheric corrosion of carbon and low-alloy steels causes direct economic losses that are estimated at around 3.4% of the global GDP, and its accurate multi-year prediction is essential for protective coating selection, service-life estimation, and infrastructure maintenance scheduling. In this study, machine learning (ML) algorithms, including gradient boosting regressor (GBR), eXtreme gradient boosting (XGBoost), random forest (RF), support vector regression (SVR), and ridge regression, were trained on a 600-sample physics-grounded dataset to predict the cumulative atmospheric corrosion loss (µm) of low-alloy steels over 1–10 years of exposure. The dataset was constructed using the exact ISO 9223:2012 dose–response function (DRF) for a first-year corrosion rate and the ISO 9224:2012 power-law multi-year kinetic model (C(t) = C₁·t0.5), spanning ISO 9223 corrosivity categories C2–CX across 11 environmental and material input features. All models were evaluated on the original (untransformed) corrosion scale under an 80/20 train/test split and five-fold cross-validation. Gradient boosting achieved the best overall performance with test set R² = 0.968, CV-R² = 0.969, RMSE = 10.58 µm, MAE = 5.99 µm, and MAPE = 12.6%. XGBoost was a close second (R² = 0.958, CV-R² = 0.960). RF achieved an R² of 0.944. SHAP (SHapley Additive exPlanations) analysis identified SO₂ deposition rate, exposure time, relative humidity, Cl⁻ deposition rate, and temperature as the five most influential predictors. The dominance of the SO₂ deposition rate (mean |SHAP| = 26.37 µm) and the high second-place ranking of exposure time (13.67 µm) are fully consistent with the ISO 9223:2012 dose–response function and ISO 9224:2012 power-law kinetics, respectively, while among the material features, Cu and Cr contents showed the strongest negative SHAP contributions, confirming their corrosion-inhibiting roles in weathering steels. These results establish a physics-consistent, interpretable ML benchmark exceeding R² = 0.90 for multi-year cumulative corrosion loss prediction and provide a quantitative tool for alloy screening, coating selection in aggressive atmospheric environments, and service-life planning. Full article

(This article belongs to the Special Issue Anti-Corrosion and Anti-Wear Coatings: Fundamentals, Technologies, and Applications)

► Show Figures

Graphical abstract

19 pages, 3706 KB

Open AccessArticle

Non-Destructive Determination of Moisture Content in White Tea During Withering Using VNIR Spectroscopy and Ensemble Modeling

by Qinghai He, Hongkai Shen, Zhiyuan Liu, Benxue Ma, Yong He, Zhi Lin, Weihong Liu, Pei Wang, Xiaoli Li and Peng Qi

Horticulturae 2026, 12(4), 488; https://doi.org/10.3390/horticulturae12040488 - 16 Apr 2026

Viewed by 579

Abstract

As one of the six major traditional tea types in China, white tea’s quality formation is primarily influenced by the withering process. However, traditional methods for monitoring withering fail to achieve precise and stable control of moisture content. To address this issue, a [...] Read more.

As one of the six major traditional tea types in China, white tea’s quality formation is primarily influenced by the withering process. However, traditional methods for monitoring withering fail to achieve precise and stable control of moisture content. To address this issue, a total of 650 samples were collected at 13 withering time points (0–36 h), and the dataset was split into training and test sets at a 7:3 ratio. This study proposes a PRXBoost ensemble model for quantitative detection of withered white tea, which integrates data augmentation and intelligent algorithms. The ensemble model uses a Bagging-based weighted integration technique to combine Partial Least Squares Regression (PLSR), Ridge, and Extreme Gradient Boosting (XGBoost) models, and it conducts an in-depth analysis of the decision-making process within the PRXBoost model. First, the effectiveness of the data augmentation strategy and the superiority of the gradient descent algorithm are verified through pre-modeling based on the PLSR model and hyperparameter pre-search using the XGBoost model, respectively. Additionally, the Bayes algorithm is employed to optimize the weights of the sub-models, further enhancing the overall predictive performance. The results show that the PRXBoost model achieved the best performance among the compared models on the test set, with R² = 0.854 and RMSE = 0.080, exceeding the highest R² of a single model by 6%. These results indicate that PRXBoost provided improved predictive performance for moisture estimation within the current dataset. Finally, the SHapley Additive exPlanations (SHAP) algorithm is used to analyze the influence of each input feature on the prediction results, successfully identifying the 1916 nm and 1453 nm spectral bands as significant influencers of the prediction outcomes. These results suggest that the proposed model can support rapid, non-destructive monitoring of moisture evolution and provide actionable information for withering endpoint decision control. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in the Processing of Horticultural Crops)

► Show Figures

Figure 1

22 pages, 1164 KB

Open AccessArticle

Carbon Emission Prediction Model for Railway Passenger Stations on the Qinghai–Tibet Plateau

by Guanguan Jia and Qingqin Wang

Sustainability 2026, 18(8), 3881; https://doi.org/10.3390/su18083881 - 14 Apr 2026

Viewed by 334

Abstract

Controlling operation-stage carbon emissions (CE) from transport buildings is crucial for China’s dual-carbon goals and the ecological security of the Qinghai–Tibet Plateau (QTP), and the sustainable development of plateau transport infrastructure. For plateau railway passenger stations (RPS), limited monitoring and distinctive high-altitude, cold-climate [...] Read more.

Controlling operation-stage carbon emissions (CE) from transport buildings is crucial for China’s dual-carbon goals and the ecological security of the Qinghai–Tibet Plateau (QTP), and the sustainable development of plateau transport infrastructure. For plateau railway passenger stations (RPS), limited monitoring and distinctive high-altitude, cold-climate operations make daily CE prediction difficult with conventional measurement- or simulation-based methods. This study develops a machine-learning approach based on a Monte Carlo synthetic database and derives engineering-standard formulas for direct use. Building scale, meteorology and passenger flow volume (PFV) were compiled for 12 representative RPS, and a large synthetic database of daily carbon emission was generated under multiple distribution constraints. With daily mean temperature, heating degree days, altitude, station floor area and PFV as inputs, four models were trained and assessed using mean absolute error, root mean square error, mean absolute percentage error (MAPE) and R². The results show that random forest (RF) performed best, achieving ~6% MAPE and R² > 0.99 on the test set, and markedly lower errors than multivariable linear regression. Interpretation of RF via feature importance and partial dependence shows that floor area, altitude and PFV dominate emissions and exhibit nonlinear response patterns. To improve transparency and transferability, ridge regression was used to fit a linear surrogate to RF predictions, producing engineering-standard formulas for daily and annual operation-stage CE. The formulas retain most predictive accuracy while requiring only readily obtainable variables, enabling rapid estimation and scenario analysis for cold, high-altitude RPS. The proposed workflow provides a replicable pathway for operational CE assessment in data-scarce regions and supports low-carbon planning, design and operation of RPS on the QTP, thereby contributing to more sustainable infrastructure development in high-altitude regions. Full article

(This article belongs to the Section Green Building)

► Show Figures

Figure 1

11 pages, 2705 KB

Open AccessArticle

Applying Self-Information-Inspired Encoding to Task-Based fMRI for Decoding Second-Language Proficiency During Naturalistic Speech Listening

by Xin Xiong, Chenyang Zhu, Chunwu Wang and Jianfeng He

Appl. Sci. 2026, 16(8), 3805; https://doi.org/10.3390/app16083805 - 14 Apr 2026

Viewed by 280

Abstract

Individual differences in second-language (L2) proficiency are expected to influence how listeners parse and represent continuous speech, yet their neural signatures under naturalistic conditions remain unclear. We investigated this question using task-based fMRI during continuous speech listening. A total of 43 healthy participants [...] Read more.

Individual differences in second-language (L2) proficiency are expected to influence how listeners parse and represent continuous speech, yet their neural signatures under naturalistic conditions remain unclear. We investigated this question using task-based fMRI during continuous speech listening. A total of 43 healthy participants completed four listening runs synchronized with MRI acquisition via PsychoPy(Peirce 2007), with eyes open throughout scanning. To promote sustained attention and comprehension, participants provided a native-language oral recall after each run. Based on behavioral proficiency scores, participants were grouped into low- (LP, n = 14), moderate- (MP, n = 14), and high-proficiency (HP, n = 15) groups. We evaluated three temporal information-encoding frameworks derived from BOLD dynamics: direct temporal series, functional connectivity (FC), and self-information weighted inter-subject correlation (ISC-W). Using a 10 × 5-fold nested cross-validation scheme, we tested both categorical classification (Support Vector Machines) for discrete proficiency groups (LP, MP, HP) and continuous multivariate regression (Ridge/Lasso) for continuous proficiency scores. Furthermore, we applied ROI-based ANOVA and univariate Neural Correlation Analysis (NCA) to identify key brain regions, evaluating significance via nonparametric permutation testing (1000 permutations) and False Discovery Rate (FDR) correction. Results indicated that while categorical classification yielded numerical trends—with ISC-W performing best—it did not reach statistical significance under stringent permutation testing. However, multivariate continuous regression using ISC-W features successfully predicted continuous proficiency scores with statistical significance (p < 0.05). Exploratory ROI analysis highlighted the bilateral orbital inferior frontal gyrus (IFG_orb_bilat) as a highly sensitive region. These findings suggest that L2 proficiency is best represented as a distributed, continuous neural variable, and that self-information weighting effectively filters background noise to capture cognitive variance. Methodologically, this study provides a reproducible pipeline integrating information-theoretic feature construction with rigorous whole-brain nonparametric inference. Full article

► Show Figures

Figure 1

36 pages, 11621 KB

Open AccessArticle

Predictive Modelling of Nitrogen Content in Molten Metal During BOF Steelmaking Processes via Python-Based Machine Learning: A Benchmarking of Statistical Techniques

by Jaroslav Demeter, Branislav Buľko and Martina Hrubovčáková

Appl. Sci. 2026, 16(8), 3774; https://doi.org/10.3390/app16083774 - 12 Apr 2026

Viewed by 440

Abstract

This study benchmarks eight Python-based machine learning models for predicting nitrogen content across four sequential stages of BOF steelmaking. A dataset of 291 metallic samples from 76 heats was employed, covering pig iron desulfurization (PHASE #1), crude steel before BOF tapping (PHASE #2), [...] Read more.

This study benchmarks eight Python-based machine learning models for predicting nitrogen content across four sequential stages of BOF steelmaking. A dataset of 291 metallic samples from 76 heats was employed, covering pig iron desulfurization (PHASE #1), crude steel before BOF tapping (PHASE #2), and secondary metallurgy start (PHASE #3) and completion (PHASE #4). Linear regression, polynomial regression, ridge regression, decision tree, random forest, feedforward neural networks (FNNs), Gaussian Process Regression (GPR), and Support Vector Regression (SVR) were implemented in Python 3 with Z-score normalization and an 80/20 train–test split, and evaluated via MAE, MSE, MAPE, and R². Ridge regression achieved the highest accuracy in PHASE #1 (84.59%) and PHASE #4 (84.04%); FNNs excelled in PHASE #2 (78.27%) with consistent cross-phase performance; linear regression was optimal for PHASE #3 (79.06%). The advanced kernel-based methods demonstrated competitive performance, with GPR achieving 84.73% in PHASE #1 and SVR attaining 77.10% in PHASE #3 and 83.40% in PHASE #4, confirming their suitability for limited industrial datasets with a nonlinear structure. A hybrid strategy remains recommended: ridge regression for PHASES #1 and #4, FNNs for PHASES #2 and #4, and linear regression for PHASE #3, with SVR as a robust alternative in phases with moderate nonlinearity. Full article

(This article belongs to the Special Issue Digital Technologies Enabling Modern Industries, 2nd Edition)

► Show Figures

Figure 1

16 pages, 4604 KB

Open AccessArticle

Simulation and Experiment of the Interaction Process Between Seeding and Soil-Engaging for Transverse Sugarcane Planter

by Biao Zhang, Dan Pan, Qiancheng Liu, Weimin Shen and Guangyi Liu

Agriculture 2026, 16(8), 853; https://doi.org/10.3390/agriculture16080853 - 12 Apr 2026

Viewed by 378

Abstract

Uneven seed spacing, skewed stalk posture, and inconsistent planting depth remain major challenges in horizontal sugarcane planting. To address these issues, a semi-automatic transverse sugarcane planter integrating a supply–buffer–discharge seeder and multiple soil-engaging components was developed. The seed placement process and the interaction [...] Read more.

Uneven seed spacing, skewed stalk posture, and inconsistent planting depth remain major challenges in horizontal sugarcane planting. To address these issues, a semi-automatic transverse sugarcane planter integrating a supply–buffer–discharge seeder and multiple soil-engaging components was developed. The seed placement process and the interaction between stalk discharge and soil disturbance were investigated through Discrete Element Method (DEM) simulations and experiments. First, the working principle and key component parameters of the whole machine were determined. It integrated the processes of soil crushing, furrowing, seeding, ridge covering. In addition, a dynamic analysis was conducted on the inter-particle disengagement effect during the two-step seed filling process of lifting and discharging. Secondly, a discrete element simulation model for the entire process of soil-engaging seed arrangement operations was established for the machine. The effects of forward speed and seed outlet position were studied using a discrete element method (DEM) simulation model that coupled soil disturbance flow with stalk-seed discharge behaviour. Furthermore, a response surface methodology (RSM) experiment was performed on the seeding test bench to quantify the effects of guiding parameters on seed placement uniformity. The determination coefficient (R²) of the established regression model exceeded 0.9, indicating high prediction accuracy. The optimal collaborative parameter combination was optimized as follows: forward speed of 1.2 m·s⁻¹, buffer inclination angle of 55°and supply roller speed of 26 r·min⁻¹. After verification, the seed placement uniformity coefficient of the seeder reached 91.8 ± 1.4%, which met the expected accuracy requirements for horizontal planting. Full article

(This article belongs to the Section Agricultural Technology)

► Show Figures

Figure 1

17 pages, 834 KB

Open AccessArticle

Improved Data-Driven Shrinkage Estimators for Regression Models Under Severe Multicollinearity

by Ali Rashash R. Alzahrani and Asma Ahmad Alzahrani

Mathematics 2026, 14(8), 1245; https://doi.org/10.3390/math14081245 - 9 Apr 2026

Viewed by 272

Abstract

Multicollinearity is a critical issue in regression analysis, often resulting in inflated variances and unstable parameter estimates. Ridge regression is a widely adopted solution to address this challenge; however, existing ridge estimators are typically tailored to specific scenarios, limiting their universal applicability. Akhtar [...] Read more.

Multicollinearity is a critical issue in regression analysis, often resulting in inflated variances and unstable parameter estimates. Ridge regression is a widely adopted solution to address this challenge; however, existing ridge estimators are typically tailored to specific scenarios, limiting their universal applicability. Akhtar and Alharthi developed ridge estimators based on condition-adjusted ridge estimators (CAREs) to handle severe multicollinearity issues. However, their approach did not account for the error variances in the estimation process. In this study, we propose improvements to these CAREs by incorporating error variances, resulting in the development of multiscale ridge estimators (

M S R E_{1}

,

M S R E_{2}

,

M S R E_{3}

and

M S R E_{4}

) that more effectively address the challenges posed by severe multicollinearity. We compare the performance of our newly proposed estimators with ordinary least square (OLS) and other existing ridge estimators using both simulation studies and real-life datasets. The evaluation, based on estimated mean squared error (MSE), demonstrates that the proposed estimators consistently outperform existing methods, particularly in scenarios with significant multicollinearity, larger sample sizes, and higher predictor dimensions. Results from three real-life datasets further validate the proposed estimators’ ability to reduce estimation error and improve predictive accuracy across diverse practical applications. Full article

(This article belongs to the Special Issue Statistical Machine Learning: Models and Its Applications)

► Show Figures

Figure 1

15 pages, 1474 KB

Open AccessArticle

Prognostic Power of Ensemble Learning in Colorectal Cancer with Peritoneal Metastasis: A Multi-Institutional Analysis

by Yoshiko Bamba, Michio Itabashi, Hirotoshi Kobayashi, Kenjiro Kotake, Masayasu Kawasaki, Yukihide Kanemitsu, Yusuke Kinugasa, Hideki Ueno, Kotaro Maeda, Takeshi Suto, Kimihiko Funahashi, Heita Ozawa, Fumikazu Koyama, Shingo Noura, Hideyuki Ishida, Masayuki Ohue, Tomomichi Kiyomatsu, Soichiro Ishihara, Keiji Koda, Hideo Baba, Kenji Kawada, Yojiro Hashiguchi, Takanori Goi, Yuji Toiyama, Naohiro Tomita, Eiji Sunami, Yoshito Akagi, Jun Watanabe, Kenichi Hakamada, Goro Nakayama, Kenichi Sugihara and Yoichi Ajioka Show full author list Hide full author list

Bioengineering 2026, 13(4), 434; https://doi.org/10.3390/bioengineering13040434 - 8 Apr 2026

Viewed by 495

Abstract

Background: Owing to significant clinical heterogeneity, the achievement of accurate survival forecasting for individuals with colorectal cancer and peritoneal metastasis continues to be a complex undertaking. We aimed to transcend traditional prognostic limitations by evaluating machine learning boosting models against standard regression-based methods [...] Read more.

Background: Owing to significant clinical heterogeneity, the achievement of accurate survival forecasting for individuals with colorectal cancer and peritoneal metastasis continues to be a complex undertaking. We aimed to transcend traditional prognostic limitations by evaluating machine learning boosting models against standard regression-based methods in terms of estimating overall survival (OS). Methods: Utilizing a multi-institutional registry of 150 patients diagnosed with synchronous peritoneal metastasis of colorectal cancer, we integrated 124 clinicopathological variables to refine our predictive models. Beyond standard preprocessing—including standardization and median imputation—we rigorously compared XGBoost and LightGBM against Ridge, Lasso, and linear regression via five-fold cross-validation. To specifically address right-censoring, an XGBoost Cox model was implemented and validated using Harrell’s C-index, with SHAP and LIME providing essential model interpretability. Results: Boosting models consistently outperformed linear alternatives, which struggled with high error rates and negative R2 values. Specifically, XGBoost achieved an MAE of 475 ± 60 and an RMSE of 585 ± 88. The XGBoost Cox model reached a C-index of 0.64 ± 0.06. SHAP analysis highlighted inflammatory markers and peritoneal disease extent as the most influential prognostic drivers. Conclusions: While boosting models offer a clear accuracy advantage over linear methods, their prognostic power remains moderate. These findings underscore the potential of ensemble learning in oncology, yet mandate external validation before these tools can be integrated into clinical decision-making. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Figure 1

Search Results (891)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (891)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI