Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (4,477)

Search Parameters:
Keywords = Regression Tree

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 2743 KB  
Article
Capturing Emotions Induced by Fragrances in Saliva: Objective Emotional Assessment Based on Molecular Biomarker Profiles
by Laurence Molina, Francisco Santos Schneider, Malik Kahli, Alimata Ouedraogo, Mellis Alali, Agnés Almosnino, Julie Baptiste, Jeremy Boulestreau, Martin Davy, Juliette Houot-Cernettig, Telma Mountou, Marine Quenot, Elodie Simphor, Victor Petit and Franck Molina
Biosensors 2026, 16(2), 81; https://doi.org/10.3390/bios16020081 - 28 Jan 2026
Abstract
In this study, we describe a non-invasive approach to objectively assess fragrance-induced emotions using multiplex salivary biomarker profiling. Traditional self-reports, physiological monitoring, and neuroimaging remain limited by subjectivity, invasiveness, or poor temporal resolution. Saliva offers an advantageous alternative, reflecting rapid neuroendocrine changes linked [...] Read more.
In this study, we describe a non-invasive approach to objectively assess fragrance-induced emotions using multiplex salivary biomarker profiling. Traditional self-reports, physiological monitoring, and neuroimaging remain limited by subjectivity, invasiveness, or poor temporal resolution. Saliva offers an advantageous alternative, reflecting rapid neuroendocrine changes linked to emotional states. We combined four key salivary biomarkers, cortisol, alpha-amylase, dehydroepiandrosterone, and oxytocin, to capture multidimensional emotional responses. Two clinical studies (n = 30, n = 63) and one user study (n = 80) exposed volunteers to six fragrances, with saliva collected before and 5 and 20 min after olfactory stimulation. Subjective emotional ratings were also obtained through questionnaires or an implicit approach. Rigorous analytical validation accounted for circadian variation and sample stability. Biomarker patterns revealed fragrance-induced emotional profiles, highlighting subgroups of participants whose biomarker dynamics correlated with particular emotional states. Increased oxytocin and decreased cortisol levels aligned with happiness and relaxation; in comparison, distinct biomarker combinations were associated with confidence or dynamism. Classification and Regression Trees (CART) analysis results demonstrated high sensitivity for detecting these profiles. Validation in an independent cohort using an implicit association test confirmed concordance between molecular profiles and behavioral measures, underscoring the robustness of this method. Our findings establish salivary biomarker profiling as an objective tool for decoding real-time emotional responses. Beyond advancing affective neuroscience, this approach holds translational potential in personalized fragrance design, sensory marketing, and therapeutic applications for stress-related disorders. Full article
(This article belongs to the Special Issue Biosensing and Diagnosis—2nd Edition)
Show Figures

Figure 1

9 pages, 756 KB  
Proceeding Paper
Effect of Data Preparation on Machine Learning Models for Diabetes Prediction
by Goran Martinović, Ivan Ivković, Domen Verber and Tatjana Bačun
Eng. Proc. 2026, 125(1), 13; https://doi.org/10.3390/engproc2026125013 - 28 Jan 2026
Abstract
This paper examines how data preparation affects machine-learning classifiers for diabetes-risk prediction using the Pima Indians Diabetes Database. Three preprocessing methods are considered: imputing invalid zeros, handling outliers, and data scaling. Nine algorithms are evaluated on this dataset: linear/probabilistic baselines (Logistic Regression, Gaussian [...] Read more.
This paper examines how data preparation affects machine-learning classifiers for diabetes-risk prediction using the Pima Indians Diabetes Database. Three preprocessing methods are considered: imputing invalid zeros, handling outliers, and data scaling. Nine algorithms are evaluated on this dataset: linear/probabilistic baselines (Logistic Regression, Gaussian Naive Bayes), distance-based methods (KNN, Support Vector Machines), a single tree-based model (Decision Tree), and tree ensembles (Random Forest, Gradient Boosting, XGBClassifier, LightGBM). Median imputation of invalid zeros yields the largest and most consistent gains in accuracy and AUC. Outlier handling uses interquartile-range filtering, with Local Outlier Factor as an auxiliary indicator; effects are modest for accuracy and small, model-dependent for AUC. Scaling offers targeted benefits: for KNN, robust scaling can slightly alter performance and may reduce AUC relative to median-only imputation in this setup; SVM shows modest gains, while tree ensembles are comparatively insensitive overall. Ensembles achieve the highest performance and remain robust under minimal preparation, while simpler models benefit most from pipelines combining median imputation, careful outlier handling, and appropriate scaling. Hyperparameter tuning yields small to substantial gains—large for Decision Trees—while leaving ensemble rankings largely unchanged. Overall, results highlight the centrality of median imputation and the selective value of scaling for distance-based classifiers in diabetes-risk prediction. Full article
Show Figures

Figure 1

24 pages, 6667 KB  
Article
Data-Driven Forecasting of Electricity Prices in Chile Using Machine Learning
by Ricardo León, Guillermo Ramírez, Camilo Cifuentes, Samuel Vergara, Roberto Aedo-García, Francisco Ramis Lanyon and Rodrigo J. Villalobos San Martin
Appl. Sci. 2026, 16(3), 1318; https://doi.org/10.3390/app16031318 - 28 Jan 2026
Abstract
This study proposes and evaluates a data-driven framework for short-term System Marginal Price (SMP) forecasting in the Chilean National Electric System (NES), a power system characterized by high penetration of variable renewable generation and persistent transmission congestion. Using publicly available hourly operational data [...] Read more.
This study proposes and evaluates a data-driven framework for short-term System Marginal Price (SMP) forecasting in the Chilean National Electric System (NES), a power system characterized by high penetration of variable renewable generation and persistent transmission congestion. Using publicly available hourly operational data for 2024, multiple machine learning regressors including Linear Regression (base case), Bayesian Ridge, Automatic Relevance Determination, Decision Trees, Random Forests, and Support Vector Regression are implemented under a node-specific modeling strategy. Two alternative approaches for predictor selection are compared: a system-wide methodology that exploits lagged SMP information from all network nodes; and a spatially filtered methodology that restricts SMP inputs to correlated subsystems identified through nodal correlation analysis. Model robustness is explicitly assessed by reserving January and July as out-of-sample test periods, capturing contrasting summer and winter operating conditions. Forecasting performance is analyzed for representative nodes located in the northern, central, and southern zones of the NES, which exhibit markedly different congestion levels and generation mixes. Results indicate that non-linear and ensemble models, particularly Random Forest and Support Vector Regression, provide the most accurate forecasts in well-connected areas, achieving mean absolute errors close to 10 USD/MWh. In contrast, forecast errors increase substantially in highly congested southern zones, reflecting the structural influence of transmission constraints on price formation. While average performance differences between M1 and M2 are modest, a paired Wilcoxon signed-rank test reveals statistically significant improvements with M2 in highly congested zones, where M2 yields lower absolute errors for most models, despite relying on fewer inputs. These findings highlight the importance of congestion-aware feature selection for reliable price forecasting in renewable-intensive systems. Full article
(This article belongs to the Special Issue New Trends in Renewable Energy and Power Systems)
Show Figures

Figure 1

21 pages, 1645 KB  
Article
Machine Learning-Based Prediction of Optimum Design Parameters for Axially Symmetric Cylindrical Reinforced Concrete Walls
by Aylin Ece Kayabekir
Processes 2026, 14(3), 455; https://doi.org/10.3390/pr14030455 - 28 Jan 2026
Abstract
This study presents a hybrid approach integrating metaheuristic optimization and machine learning methods to quickly and reliably estimate the optimum design parameters of dome-shaped axially symmetric cylindrical reinforced concrete (RC) walls. A comprehensive dataset was created using the Jaya algorithm to minimize total [...] Read more.
This study presents a hybrid approach integrating metaheuristic optimization and machine learning methods to quickly and reliably estimate the optimum design parameters of dome-shaped axially symmetric cylindrical reinforced concrete (RC) walls. A comprehensive dataset was created using the Jaya algorithm to minimize total material cost for hinged and fixed support conditions. For each optimized design case, total wall height (H), dome height (Hd), dome thickness (hd), and fluid unit weight (γ) were considered as input parameters; optimum wall thickness (hw) and total cost were determined as output parameters. Using the obtained dataset, a total of thirteen different regression-based machine learning algorithms, including linear regression-based models, tree-based ensemble methods, and neural network models, were trained and tested. Hyperparameter adjustments for all models were performed using the Optuna framework, and model performances were evaluated using a ten-fold cross-validation method and holdout dataset results. The results showed that machine learning models can learn the optimum design space obtained from metaheuristic optimization outputs with high accuracy. In optimum wall thickness estimation, Gradient Boosting-based models provided the highest accuracy under both hinged and fixed support conditions. In total cost estimation, the Gradient Boosting model stood out under hinged support conditions, while the XGBoost model yielded the most successful results for fixed support conditions. The findings clearly show that no single machine learning model exhibits the best performance for all output parameters and support conditions. The proposed approach offers significantly higher computational efficiency compared to traditional iterative optimization processes and allows for rapid estimation of optimum design parameters without the need for any iterations. In this respect, this study provides an effective decision support tool that can be used especially in the preliminary design phases and contributes to sustainable, cost-effective reinforced concrete structure design. Full article
(This article belongs to the Special Issue Machine Learning Models for Sustainable Composite Materials)
Show Figures

Figure 1

14 pages, 4548 KB  
Article
Feasibility Study of Combining Data from Different Sources Within Artificial Intelligence Models to Reduce the Need for Constant Velocity Joint Test Rig Runs
by Julian Lehnert, Orkan Eryilmaz, Arne Berger and Dirk Reith
Machines 2026, 14(2), 148; https://doi.org/10.3390/machines14020148 - 28 Jan 2026
Abstract
Within this paper, the feasibility of reducing test rig runs in constant velocity joint (CVJ) development by combining data from different sources (simulation and test rig) for artificial intelligence (AI) models has been investigated. Therefore, a case study on CVJ efficiency prediction using [...] Read more.
Within this paper, the feasibility of reducing test rig runs in constant velocity joint (CVJ) development by combining data from different sources (simulation and test rig) for artificial intelligence (AI) models has been investigated. Therefore, a case study on CVJ efficiency prediction using a random forest regressor, a decision-tree-based algorithm, was conducted using a data set of 95,798 points derived from both test rigs (52,486 points) and multi-body simulations (43,312 points). The amount of test rig data in the training data set of the regression model was iteratively reduced from 100% to 12.5% to investigate the need of expensive test rig data. Additionally, clustering models related to KMeans-algorithm were performed, to achieve further improvements of the AI models and more information about the data. Furthermore, regression and clustering models were performed with data dimensionally reduced by principal component analysis (PCA) to improve model complexity and performance. The number of principal components for the regression model was reduced from 65 to 5 components to investigate their influence on the models predictions. The study showed that combining data from different sources has a positive impact on the predictions of AI models and the confidence of their results, even though the R2-Score of the trained regression models did not change significantly, ranging from 0.927% to 0.9497%. Full article
(This article belongs to the Special Issue Advances in Dynamics and Vibration Control in Mechanical Engineering)
Show Figures

Figure 1

29 pages, 5001 KB  
Article
Integrated Assessment of Soil Loss and Sediment Delivery Using USLE, Sediment Yield, and Principal Component Analysis in the Mun River Basin, Thailand
by Pee Poatprommanee, Supanut Suntikoon, Morrakot Khebchareon and Schradh Saenton
Land 2026, 15(2), 220; https://doi.org/10.3390/land15020220 - 27 Jan 2026
Abstract
The Mun River Basin, the largest Mekong tributary in Northeast Thailand, has experienced extensive agricultural expansion and forest decline, raising concerns over increasing soil erosion and sediment transfer. This study provides an integrated assessment of soil loss, sediment yield (SY), and [...] Read more.
The Mun River Basin, the largest Mekong tributary in Northeast Thailand, has experienced extensive agricultural expansion and forest decline, raising concerns over increasing soil erosion and sediment transfer. This study provides an integrated assessment of soil loss, sediment yield (SY), and sediment delivery ratio (SDR) across 19 sub-watersheds using the Universal Soil Loss Equation (USLE), field-based SY data, and multivariate statistical analyses in 2024. Basinwide soil loss was estimated at ~35 million t y−1 (mean 4.96 t ha−1 y−1), with more than 80% of the basin classified in the no erosion to very low erosion classes. Despite substantial hillslope erosion, only 402,405 t y−1 of sediment reaches the river network, corresponding to a low SDR of 1.15%, which falls within the range reported for large tropical watersheds with significant reservoir infrastructure. Soil loss is most strongly influenced by slope and forested terrain, while SY responds primarily to rainfall and tree plantations; urban land, croplands, and reservoirs act as sediment sinks. Principal Component Analysis (PCA) resolved multicollinearity and produced six components explaining over 90% of predictor variance. A PCA-based regression model predicted SY per unit area with high accuracy (r = 0.81). The results highlight the dominant roles of hydroclimate and land-use structure in shaping sediment connectivity, supporting targeted soil and watershed-management strategies. Full article
(This article belongs to the Section Land Use, Impact Assessment and Sustainability)
Show Figures

Figure 1

20 pages, 2796 KB  
Article
A GBRT-Based State-of-Health Estimation Method for Lithium-Ion Batteries
by Chun Chang, Yedong He, Yutong Wu, Yuanzhong Xu and Jiuchun Jiang
Energies 2026, 19(3), 659; https://doi.org/10.3390/en19030659 - 27 Jan 2026
Viewed by 30
Abstract
Lithium-ion batteries are widely applied in transportation, communication, and other fields. Nevertheless, during prolonged cycling operation, internal electrochemical reactions inevitably lead to the degradation of the state-of-health (SOH). To ensure the reliability and safety of lithium-ion batteries, accurate SOH estimation is of critical [...] Read more.
Lithium-ion batteries are widely applied in transportation, communication, and other fields. Nevertheless, during prolonged cycling operation, internal electrochemical reactions inevitably lead to the degradation of the state-of-health (SOH). To ensure the reliability and safety of lithium-ion batteries, accurate SOH estimation is of critical importance. Nevertheless, under practical operating conditions, obtaining fully recorded charge–discharge data is often impractical. Motivated by the practical charging behaviors of lithium-ion batteries, this paper proposes a practical SOH estimation method based on incremental capacity analysis, dynamic time warping (DTW), and gradient-boosting regression trees (GBRTs). Three health indicators—interval incremental capacity features, local capacity–voltage curve similarity, and segmented voltage curve similarity—are extracted. The proposed method requires only 0.13 V and 0.07 V voltage windows on the Oxford and CALCE datasets. The effectiveness of the proposed model is verified across both public datasets and laboratory test data. Experimental results demonstrate RMSE values of approximately 2.5% and 2.0%, respectively. Compared with mainstream SOH estimation algorithms, the proposed approach delivers comparable accuracy while achieving training time reductions of up to 57.6% and 91.9% relative to GPR and SVM, making it suitable for real-time battery management systems. Full article
(This article belongs to the Section D: Energy Storage and Application)
Show Figures

Figure 1

15 pages, 2511 KB  
Article
Topographic Heterogeneity Drives the Functional Traits and Stoichiometry of Abies georgei var. smithii Bark in the Sygera Mountains, Southeast Tibet
by Wenyan Xu, Jie Lu, Chao Wang and Rui Li
Forests 2026, 17(2), 163; https://doi.org/10.3390/f17020163 - 27 Jan 2026
Viewed by 44
Abstract
Bark is a multifunctional organ critical for tree survival, yet its functional plasticity in response to micro-environmental heterogeneity at alpine timberlines remains poorly understood. Here, we investigated the variations in bark physical traits (thickness, density), allometric scaling, and stoichiometric characteristics (C, N, P) [...] Read more.
Bark is a multifunctional organ critical for tree survival, yet its functional plasticity in response to micro-environmental heterogeneity at alpine timberlines remains poorly understood. Here, we investigated the variations in bark physical traits (thickness, density), allometric scaling, and stoichiometric characteristics (C, N, P) of Abies georgei var. smithii (Viguie & Gaussen) W. C. Cheng & L. K. Fu on contrasting sunny and shady slopes in the Sygera Mountains, southeastern Tibetan Plateau. Despite the relative homogeneity of soil physicochemical properties between slope aspects, bark traits exhibited remarkable phenotypic plasticity. Trees on the shady slope possessed significantly thicker bark with higher nitrogen concentrations, adopting a “resource-acquisitive strategy”. Standardized Major Axis (SMA) regression indicated isometric scaling (b1.03) for trees on the shady slope, reflecting a sustained investment in bark thickness to provide thermal insulation against cold stress. Conversely, trees on the sunny slope exhibited negative allometry (b 0.87), characterized by denser tissues and elevated C/N ratios. This shift represents a conservative strategy geared toward hydraulic safety and resistance to high radiation and evaporative loss. Crucially, our results show that bark traits are largely decoupled from soil nutrient gradients, being shaped instead by microclimate. The distinct trade-off—prioritizing insulation on shady slopes versus conservation on sunny slopes—underscores the importance of phenotypic plasticity for the persistence of timberline species in a changing climate. Full article
(This article belongs to the Section Forest Ecophysiology and Biology)
Show Figures

Figure 1

20 pages, 1141 KB  
Article
Machine Learning Applications for Sustainable Housing Policy: Understanding Price Determinants to Inform Affordable Housing Strategies
by Fan Zhang, Yifang Luo, Yuqing Dong, Qikai Zhang and Aihua Han
Algorithms 2026, 19(2), 98; https://doi.org/10.3390/a19020098 - 26 Jan 2026
Viewed by 95
Abstract
Understanding how housing attributes are capitalized into prices is central to addressing urban affordability challenges. Using 2799 second-hand housing transactions from Wenzhou, China, this study examines residential price formation under pronounced spatial and structural heterogeneity. Multiple predictive models are evaluated within a unified [...] Read more.
Understanding how housing attributes are capitalized into prices is central to addressing urban affordability challenges. Using 2799 second-hand housing transactions from Wenzhou, China, this study examines residential price formation under pronounced spatial and structural heterogeneity. Multiple predictive models are evaluated within a unified 10-fold cross-validation framework. Results indicate that Random Forest delivers the strongest predictive performance, achieving a normalized mean squared error below 0.10 and explaining over 90% of out-of-sample price variation, substantially outperforming hedonic regression, regression trees, bagging, boosting, and support vector models. Permutation-based importance analysis identifies district location, building scale, and floor area as the dominant price determinants, while the influence of renovation quality, transportation access, and educational amenities varies across districts and dwelling types. These findings reveal strong nonlinearities and heterogeneous valuation mechanisms in rapidly urbanizing housing markets. Methodologically, the study demonstrates how interpretable machine learning complements traditional hedonic analysis, while providing policy-relevant insights into housing affordability dynamics in medium-sized Chinese cities. Full article
(This article belongs to the Special Issue Algorithms for Smart Cities (3rd Edition))
21 pages, 2364 KB  
Article
A Machine Learning Approach to Understanding Teacher Engagement in Sustainable Education Systems
by Esra Geçikli and Figen Çam-Tosun
Systems 2026, 14(2), 121; https://doi.org/10.3390/systems14020121 - 26 Jan 2026
Viewed by 101
Abstract
Education can be conceptualized as a complex socio-technical system in which teacher engagement functions as a dynamic component supporting system performance and adaptability. The present study examines how science teachers’ perceptions of sustainable education interact with their levels of work engagement, providing empirical [...] Read more.
Education can be conceptualized as a complex socio-technical system in which teacher engagement functions as a dynamic component supporting system performance and adaptability. The present study examines how science teachers’ perceptions of sustainable education interact with their levels of work engagement, providing empirical insights into system-level relationships relevant to educational sustainability. The study sample consisted of 246 science teachers, and data were collected using the Sustainable Education Scale and the Engaged Teacher Scale. Adopting a systems-informed analytical perspective, the study employs machine learning methods (Random Forest, CART, Extra Trees, and Bagging Regression) to explore non-linear relationships and interaction patterns that may remain obscured in conventional linear analyses. The results indicate that structural factors such as weekly teaching hours and academic qualifications are associated with variations in both sustainable education perceptions and work engagement. Moreover, the findings suggest a reciprocal relationship between sustainability-oriented perceptions and teacher engagement, consistent with feedback dynamics observed in complex educational systems. Rather than proposing a new theoretical framework or algorithm, the study demonstrates the utility of machine learning as a methodological tool for examining system-level interactions and emergent patterns in education, offering empirical insights that may inform sustainability-oriented practices in complex social systems. Full article
(This article belongs to the Section Systems Practice in Social Science)
Show Figures

Figure 1

11 pages, 1286 KB  
Article
Establishment and Validation of Serum Ferritin Reference Intervals Based on Real-World Big Data and Multi-Strategy Partitioning Algorithms
by Yixin Xu, Xiaojuan Wu, Junlong Zhang, Qian Niu, Bei Cai and Qiang Miao
J. Clin. Med. 2026, 15(3), 976; https://doi.org/10.3390/jcm15030976 - 26 Jan 2026
Viewed by 122
Abstract
Background/Objectives: We aimed to establish and validate population-based reference intervals (RIs) for serum ferritin (SF) using an indirect, date-driven approach based on real-world laboratory data and to optimize partitioning strategies. Methods: SF results from 29,723 apparently healthy individuals who underwent health examinations at [...] Read more.
Background/Objectives: We aimed to establish and validate population-based reference intervals (RIs) for serum ferritin (SF) using an indirect, date-driven approach based on real-world laboratory data and to optimize partitioning strategies. Methods: SF results from 29,723 apparently healthy individuals who underwent health examinations at West China Hospital between 2020 and 2024 were retrospectively analyzed. SF was measured on a Roche Cobas e801 electrochemiluminescence immunoassay platform. After Box–Cox transformation, outliers were removed using an iterative Tukey method. Potential partitioning factors were evaluated, and data-driven age cut-points were explored using decision tree regression and verified with the Harris–Boyd criteria. RIs were estimated using nonparametric percentile methods and validated in an independent cohort of 2494 individuals. Results: SF concentrations were significantly higher in males than in females (p < 0.001). In females, SF showed a significant positive association with age (r = 0.466, p < 0.001), whereas no such association was observed in males. Decision tree analysis identified 50 years as the optimal age cut-off for females (R2 = 0.2467). The final study-derived RIs were 98.02–997.78 µg/L for males, 10.30–299.55 µg/L for females ≤ 50 years, and 36.61–507.00 µg/L for females > 50 years. In the validation cohort, the study-derived RIs achieved pass rates of 93.83–94.72%, which were significantly higher than the manufacturer-provided RIs (37.12–73.97%, all p < 0.001). Conclusions: Using a large health examination database and a multi-step partitioning strategy, we established robust sex- and age-specific SF RIs on the Roche Cobas e801 platform for the local population. This work provides a reproducible, generalizable framework for indirect RI determination of other biomarkers. Full article
(This article belongs to the Section Clinical Laboratory Medicine)
Show Figures

Figure 1

26 pages, 9745 KB  
Article
Adulteration Detection of Multi-Species Vegetable Oils in Camellia Oil Using SICRIT-HRMS and Machine Learning Methods
by Mei Wang, Ting Liu, Han Liao, Xian-Biao Liu, Qi Zou, Hao-Cheng Liu and Xiao-Yin Wang
Foods 2026, 15(3), 434; https://doi.org/10.3390/foods15030434 - 24 Jan 2026
Viewed by 140
Abstract
We aimed to establish a rapid and precise method for identifying and quantifying multi-species vegetable oil (corn oil, olive oil (OLO), soybean oil, and sunflower oil (SUO)) adulterations in camellia oil (CAO), using soft ionization by chemical reaction in transfer–high-resolution mass spectrometry (SICRIT-HRMS) [...] Read more.
We aimed to establish a rapid and precise method for identifying and quantifying multi-species vegetable oil (corn oil, olive oil (OLO), soybean oil, and sunflower oil (SUO)) adulterations in camellia oil (CAO), using soft ionization by chemical reaction in transfer–high-resolution mass spectrometry (SICRIT-HRMS) and machine learning methods. The results showed that SICRIT-HRMS could effectively characterize the volatile profiles of pure and adulterated CAO samples, including binary, ternary, quaternary, and quinary adulteration systems. The low m/z region (especially 100–300) exhibited importance to oil classification in multiple feature-selection methods. For qualitative detection, binary classification models based on convolutional neural networks (CNN), Random Forest (RF), and gradient boosting trees (GBT) algorithms showed high accuracies (98.70–100.00%) for identifying CAO adulteration under no dimensionality reduction (NON), principal component analysis (PCA), and uniform manifold approximation and projection (UMAP) strategies. The RF algorithm exhibited relatively high accuracy (96.25–99.45%) in multiclass classification. Moreover, the five models, including CNN, RF, support vector machines (SVM), logistic regression (LR), and GBT, exhibited different performances in distinguishing pure and adulterated CAO. Among 1093 blind oil samples, under NON, PCA, and UMAP: 10, 5, and 67 samples were misclassified by CNN model; 6, 7, and 41 samples were misclassified by RF model; 8, 9, and 82 samples were misclassified by SVM model; 17, 18, and 78 samples were misclassified by LR model; 7, 9, and 43 samples were misclassified by GBT model. For quantitative prediction, the PCA-CNN model performed optimally in predicting adulteration levels in CAO, especially with respect to OLO and SUO, exhibiting a high coefficient of determination for calibration (RC2, 0.9664–0.9974) and coefficient of determination for prediction (Rp2, 0.9599–0.9963) values, low root mean square error of calibration (RMSEC, 0.9–5.3%) and root mean square error of prediction (RMSEP, 1.1–5.8%) values, and RPD (5.0–16.3) values greater than 3.0. These results indicate that SICRIT-HRMS combined with machine learning can rapidly and accurately identify and quantify multi-species vegetable oil adulterations in CAO, which provides a reference for developing non-targeted and high-throughput detection methods in edible oil authenticity. Full article
Show Figures

Graphical abstract

18 pages, 1843 KB  
Article
Predicting Human and Environmental Risk Factors of Accidents in the Energy Sector Using Machine Learning
by Kawtar Benderouach, Idriss Bennis, Khalifa Mansouri and Ali Siadat
Appl. Sci. 2026, 16(3), 1203; https://doi.org/10.3390/app16031203 - 24 Jan 2026
Viewed by 121
Abstract
The aim of this article is to develop a machine learning (ML)-based predictive model for industrial accidents in the energy sector. The dataset used in this study was obtained from the Kaggle platform and consists of summaries derived from reports of occupational incidents [...] Read more.
The aim of this article is to develop a machine learning (ML)-based predictive model for industrial accidents in the energy sector. The dataset used in this study was obtained from the Kaggle platform and consists of summaries derived from reports of occupational incidents resulting in injuries or deaths between 2015 and 2017. A total of 4739 accident cases were included, containing information on accident date, accident summary, degree and nature of injury, affected body part, event type, human factors, and environmental factors. Six supervised machine learning models—Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, Gradient Boosting Decision Trees (GBDT), and Extreme Gradient Boosting (XGBoost)—were developed and compared to identify the most suitable model for the data. Model performance was evaluated using accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC), which were selected to ensure reliable prediction in safety-critical accident scenarios. The results indicate that XGBoost and GBDT achieve superior performance in predicting human and environmental risk factors. These findings demonstrate the potential of machine learning for improving safety management in the energy sector by identifying risk mechanisms, enhancing safety awareness, and providing quantitative predictions of fatal and non-fatal accident occurrences for integration into safety management systems. Full article
(This article belongs to the Special Issue AI in Industry 4.0)
Show Figures

Figure 1

24 pages, 5858 KB  
Article
NADCdb: A Joint Transcriptomic Database for Non-AIDS-Defining Cancer Research in HIV-Positive Individuals
by Jiajia Xuan, Chunhua Xiao, Runhao Luo, Yonglei Luo, Qing-Yu He and Wanting Liu
Int. J. Mol. Sci. 2026, 27(3), 1169; https://doi.org/10.3390/ijms27031169 - 23 Jan 2026
Viewed by 93
Abstract
Non-AIDS-defining cancers (NADCs) have emerged as an increasingly prominent cause of non-AIDS-related morbidity and mortality among people living with HIV (PLWH). However, the scarcity of NADC clinical samples, compounded by privacy and security constraints, continues to present formidable obstacles to advancing pathological and [...] Read more.
Non-AIDS-defining cancers (NADCs) have emerged as an increasingly prominent cause of non-AIDS-related morbidity and mortality among people living with HIV (PLWH). However, the scarcity of NADC clinical samples, compounded by privacy and security constraints, continues to present formidable obstacles to advancing pathological and clinical investigations. In this study, we adopted a joint analysis strategy and deeply integrated and analyzed transcriptomic data from 12,486 PLWH and cancer patients to systematically identify potential key regulators for 23 NADCs. This effort culminated in NADCdb—a database specifically engineered for NADC pathological exploration, structured around three mechanistic frameworks rooted in the interplay of immunosuppression, chronic inflammation, carcinogenic viral infections, and HIV-derived oncogenic pathways. The “rNADC” module performed risk assessment by prioritizing genes with aberrant expression trajectories, deploying bidirectional stepwise regression coupled with logistic modeling to stratify the risks for 21 NADCs. The “dNADC” module, synergized patients’ dysregulated genes with their regulatory networks, using Random Forest (RF) and Conditional Inference Trees (CITs) to identify pathogenic drivers of NADCs, with an accuracy exceeding 75% (in the external validation cohort, the prediction accuracy of the HIV-associated clear cell renal cell carcinoma model exceeded 90%). Meanwhile, “iPredict” identified 1905 key immune biomarkers for 16 NADCs based on the distinct immune statuses of patients. Importantly, we conducted multi-dimensional profiling of these key determinants, including in-depth functional annotations, phenotype correlations, protein–protein interaction (PPI) networks, TF-miRNA-target regulatory networks, and drug prediction, to deeply dissect their mechanistic roles in NADC pathogenesis. In summary, NADCdb serves as a novel, centralized resource that integrates data and provides analytical frameworks, offering fresh perspectives and a valuable platform for the scientific exploration of NADCs. Full article
(This article belongs to the Special Issue Novel Molecular Pathways in Oncology, 3rd Edition)
Show Figures

Figure 1

31 pages, 27773 KB  
Article
Machine Learning Techniques for Modelling the Water Quality of Coastal Lagoons
by Juan Marcos Lorente-González, José Palma, Fernando Jiménez, Concepción Marcos and Angel Pérez-Ruzafa
Water 2026, 18(3), 297; https://doi.org/10.3390/w18030297 - 23 Jan 2026
Viewed by 248
Abstract
This study evaluates the performance of several machine learning models in predicting dissolved oxygen concentration in the surface layer of the Mar Menor coastal lagoon. In recent years, this ecosystem has suffered a continuous process of eutrophication and episodes of hypoxia, mainly due [...] Read more.
This study evaluates the performance of several machine learning models in predicting dissolved oxygen concentration in the surface layer of the Mar Menor coastal lagoon. In recent years, this ecosystem has suffered a continuous process of eutrophication and episodes of hypoxia, mainly due to continuous influx of nutrients from agricultural activities, causing severe water quality deterioration and mortality of local flora and fauna. In this context, monitoring the ecological status of the Mar Menor and its watershed is essential to understand the environmental dynamics that trigger these dystrophic crises. Using field data, this study evaluates the performance of eight predictive modelling approaches, encompassing regularised linear regression methods (Ridge, Lasso, and Elastic Net), instance-based learning (k-nearest neighbours, KNN), kernel-based regression (support vector regression with a radial basis function kernel, SVR-RBF), and tree-based ensemble techniques (Random Forest, Regularised Random Forest, and XGBoost), under multiple experimental settings involving spatial variability and varying time lags applied to physicochemical and meteorological predictors. The results showed that incorporating time lags of approximately two weeks in physicochemical variables markedly improves the models’ ability to generalise to new data. Tree-based regression models achieved the best overall performance, with eXtreme Gradient Boosting providing the highest evaluation metrics. Finally, analysing predictions by sampling point reveals spatial patterns, underscoring the influence of local conditions on prediction quality and the need to consider both spatial structure and temporal inertia when modelling complex coastal lagoon systems. Full article
Show Figures

Figure 1

Back to TopTop