MDPI - Publisher of Open Access Journals

20 pages, 9647 KB

Open AccessArticle

CCL2 and PAK6 as Candidate Biomarkers of Neuroinflammation in Parkinson’s Disease: An Integrated Machine Learning and Single-Nucleus Transcriptomic Study

by Qixin Zhu, Zhen Zhang, Leiming Zhang, Qian Li, Ting Zhang and Fei Yang

Brain Sci. 2026, 16(5), 463; https://doi.org/10.3390/brainsci16050463 - 25 Apr 2026

Viewed by 131

Abstract

Background: Neuroinflammation is recognized as a key contributor to Parkinson’s disease (PD), but the relationships between inflammatory signaling, immune-state alterations, and cell-type-specific transcriptional programs remain unclear. Methods: Public transcriptomic datasets, including GSE20141 (discovery cohort) and the substantia nigra subset of GSE114517 (external validation [...] Read more.

Background: Neuroinflammation is recognized as a key contributor to Parkinson’s disease (PD), but the relationships between inflammatory signaling, immune-state alterations, and cell-type-specific transcriptional programs remain unclear. Methods: Public transcriptomic datasets, including GSE20141 (discovery cohort) and the substantia nigra subset of GSE114517 (external validation cohort), were analyzed. Genes identified by exploratory differential-expression screening in the discovery cohort were intersected with predefined inflammation- and chemokine-related gene sets to define a candidate space for downstream prioritization. Protein–protein interaction, Gene Ontology, KEGG, and immune-signature analyses were performed, followed by machine learning-based feature prioritization using Elastic Net, support vector machine-recursive feature elimination, and random forest. Prioritized candidates were further evaluated by cross-platform validation, single-nucleus transcriptomic mapping, and a hypothesis-generating in silico perturbation analysis in PD astrocytes. Results: Seventeen genes were retained at the intersection of PD-related differentially expressed genes and inflammation-/chemokine-associated gene sets. These candidates formed a response module enriched in mitochondrial organization, oxidative phosphorylation, and mitophagy pathways. Immune-signature analysis suggested an altered transcriptome-derived immune landscape in PD, with changes in NK cell-related signatures and significant correlations between immune-state scores and the candidate genes. Machine learning-based prioritization yielded five shared candidates, of which only CCL2 and PAK6 showed same-direction support with nominal significance in the external validation cohort. Single-nucleus transcriptomic analysis localized CCL2 predominantly to astrocytes, whereas PAK6 was more strongly associated with neuronal populations, particularly OTX2-positive ventral midbrain neurons. In silico perturbation analysis further predicted that CCL2 suppression in PD astrocytes may be associated with translational- and ribosome-related regulatory programs. Conclusions: CCL2 and PAK6 emerged as prioritized candidate biomarkers associated with PD-related inflammatory and chemokine-linked transcriptional alterations in the substantia nigra. More broadly, this study provides a multi-layered framework for candidate prioritization, cross-platform validation, and cell-type-level contextualization in PD neuroinflammation. Because the study is computational and the perturbation analysis is predictive, orthogonal experimental validation will be required to determine whether CCL2 and PAK6 are biomarkers of disease-associated transcriptional states, functional contributors to PD pathogenesis, or both. Full article

(This article belongs to the Section Neurodegenerative Diseases)

26 pages, 1507 KB

Open AccessArticle

Transcriptomic Profiling Combined with Machine Learning and Mendelian Randomization Identifies Diagnostic Biomarkers and Immune Infiltration Patterns in Diabetic Kidney Disease

by Haiwen Liu, Qiang Fu and Jing Chen

Molecules 2026, 31(9), 1390; https://doi.org/10.3390/molecules31091390 - 23 Apr 2026

Viewed by 139

Abstract

Diabetic kidney disease (DKD) affects approximately 40% of patients with diabetes mellitus and remains a leading cause of end-stage renal disease worldwide. Early diagnosis and identification of therapeutic targets are critical for improving patient outcomes, yet reliable biomarkers are lacking. This study integrated [...] Read more.

Diabetic kidney disease (DKD) affects approximately 40% of patients with diabetes mellitus and remains a leading cause of end-stage renal disease worldwide. Early diagnosis and identification of therapeutic targets are critical for improving patient outcomes, yet reliable biomarkers are lacking. This study integrated transcriptomic data from the Gene Expression Omnibus (GEO) database (GSE96804, GSE30528, and GSE142025) with machine learning algorithms and Mendelian randomization (MR) to identify diagnostic biomarkers for DKD. Differentially expressed genes (DEGs) were identified and intersected with key modules from weighted gene co-expression network analysis (WGCNA). Four machine learning methods—least absolute shrinkage and selection operator (LASSO), random forest (RF), support vector machine-recursive feature elimination (SVM-RFE), and extreme gradient boosting (XGBoost)—were applied for feature selection. Five hub genes (SPP1, CD44, VCAM1, C3, and TIMP1) were identified at the intersection of these approaches. Two-sample MR analysis using eQTL data from the eQTLGen Consortium and kidney function GWAS from the CKDGen Consortium provided evidence supporting potential causal associations between SPP1, C3, and TIMP1 expression and estimated glomerular filtration rate decline. Immune infiltration analysis via CIBERSORT estimated elevated proportions of M1 macrophages and activated CD4

^{+}

memory T cells in DKD samples, with all five hub genes showing correlations with macrophage infiltration. A diagnostic model based on these five genes achieved a cross-validated area under the receiver operating characteristic curve (CV-AUC) of 0.938 in the discovery dataset and AUC values of 0.917 and 0.889 in two independent external validation cohorts. Drug–gene interaction analysis identified 10 candidate compounds targeting the hub genes. These findings provide a computational framework for identifying candidate diagnostic biomarkers and generating hypotheses regarding potential therapeutic targets for DKD; however, all results are derived from in silico analyses and require experimental validation—including qPCR, immunohistochemistry, and prospective clinical cohort studies—before clinical applicability can be established. Full article

► Show Figures

Graphical abstract

26 pages, 5646 KB

Open AccessArticle

Study on Early Pregnancy Diagnosis of Sows Based on Body Fluid Metabolite Detection Combined with Machine Learning Models

by Yun Feng, Ruonan Gao, Wengang Yang, Huiwen Lu, Weizeng Sun, Yun Zhang, Yujun Ren, Liming Gao, Mengxun Li, Qingchun Li, Guang Pu, Yongsheng Zhang, Zikai Ai, Kun Yan and Tao Huang

Vet. Sci. 2026, 13(5), 409; https://doi.org/10.3390/vetsci13050409 - 22 Apr 2026

Viewed by 242

Abstract

The conventional window for ultrasonic pregnancy diagnosis in sows is 22–25 days post-insemination, which often results in missed opportunities for the optimal re-insemination of non-pregnant sows and elevated production costs. This present study aimed to establish an early pregnancy detection method for sows [...] Read more.

The conventional window for ultrasonic pregnancy diagnosis in sows is 22–25 days post-insemination, which often results in missed opportunities for the optimal re-insemination of non-pregnant sows and elevated production costs. This present study aimed to establish an early pregnancy detection method for sows at 12–18 days post-insemination, thereby providing a reference for efficient reproductive management. Saliva, urine and vaginal secretions were collected from sows during this period, and seven metabolites were quantified. Seven machine learning models were employed for data analysis, after which the optimal combination was determined, and the detection protocol was refined using recursive feature elimination. The results revealed that the majority of metabolites in saliva and urine differed significantly between pregnant and non-pregnant groups (p < 0.05). Among the models evaluated, the random forest algorithm exhibited the best predictive performance, with accuracy ranging from 0.59 to 1.00. Saliva sampled at 17 days post-insemination was identified as the optimal diagnostic medium, and 100% prediction accuracy was achieved by measuring only three metabolites: Glc, Ste, and Xan. The diagnostic approach established in this study allows pregnancy detection 5–8 days earlier than conventional methods, with the additional benefits of non-invasive sampling and minimal stress to sows. Accordingly, it provides a novel reference for enhancing the efficiency of swine production. Full article

(This article belongs to the Special Issue Advances in Animal Reproductive Biology and Technologies)

► Show Figures

Graphical abstract

40 pages, 1792 KB

Open AccessArticle

An Effective Model-Based Voting Classifier for Diabetes Mellitus Classification

by Diyar Qader Zeebaree, Merdin Shamal Salih, Danial William Odeesho, Dilovan Asaad Zebari, Nechirvan Asaad Zebari, Omar I. Dallal Bashi, Reving Masoud Abdulhakeem and Yahya Ahmed Yahya

Bioengineering 2026, 13(4), 480; https://doi.org/10.3390/bioengineering13040480 - 21 Apr 2026

Viewed by 473

Abstract

Diabetes mellitus is a health issue that is rapidly increasing worldwide, and it affects more than 347 million people globally. It is important to note that the disease can be successfully detected in its early stages, enabling physicians to avoid complications and improve [...] Read more.

Diabetes mellitus is a health issue that is rapidly increasing worldwide, and it affects more than 347 million people globally. It is important to note that the disease can be successfully detected in its early stages, enabling physicians to avoid complications and improve patient outcomes. Despite the fact that machine learning (ML) has been extensively used in diabetes classification, the available solutions tend to place little or no emphasis on feature selection and ensembles, which limits prediction accuracy and generalizability. In this study, we introduce a hybrid framework that is based on three feature-selection algorithms, specifically, genetic algorithm (GA), correlation-based feature selection (CFS) and recursive feature elimination (RFE), in single and hybrid forms, and three classifiers, namely, multi-layer perceptron (MLP), support vector machine (SVM) and random forest (RF), to achieve a greater predictive robustness with the aid of soft voting. Experimental findings obtained from a benchmark diabetes dataset indicate that the RFE + CFS + SVM combination achieves the best performance, with an accuracy of 98.0%, sensitivity of 97.43%, specificity of 99.03%, precision of 99.51% and F1-score of 98.72%. These results indicate that the suggested hybrid feature-selection and ensemble learning model can offer a robust and highly effective approach for early-stage diabetes diagnosis, one which clinicians may use to make timely and accurate decisions. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Figure 1

23 pages, 13020 KB

Open AccessArticle

Identification of Key Osteoarthritis-Associated Genes Based on DNA Methylation

by Jian Zhao, Changwu Wu, Zhejun Kuang, Han Wang and Lijuan Shi

Int. J. Mol. Sci. 2026, 27(8), 3388; https://doi.org/10.3390/ijms27083388 - 9 Apr 2026

Viewed by 266

Abstract

Osteoarthritis (OA) is a complex degenerative joint disease for which early diagnosis and clear molecular characterization remain limited. DNA methylation has been increasingly recognized as an important regulatory factor in OA pathogenesis. In this study, we proposed an integrative computational framework combining statistical [...] Read more.

Osteoarthritis (OA) is a complex degenerative joint disease for which early diagnosis and clear molecular characterization remain limited. DNA methylation has been increasingly recognized as an important regulatory factor in OA pathogenesis. In this study, we proposed an integrative computational framework combining statistical analysis, machine learning, deep learning, and functional genomics to identify and validate OA-associated genes and methylation biomarkers for diagnostic and biological interpretation. Candidate CpG sites were obtained using two complementary strategies: differential methylation analysis and selection of loci located near transcription start sites of previously reported OA-related genes. Key features were further refined using support vector machine recursive feature elimination and random forest algorithms. Based on the selected loci, we developed a feature-fusion diagnostic model that combines Transformer and convolutional neural networks with adaptive weighting to capture both global dependency structures and local methylation patterns. A panel of 220 methylation sites demonstrated stable and reproducible diagnostic performance in an independent cohort. Functional annotation and pathway analysis highlighted several established OA-associated genes, including TGFBR2, SMAD3, PPARG, and MAPK3, and suggested INHBB as a potential novel effector gene, with additional support for AMH and INHBE involvement. Overall, this study presents a robust methylation-based framework for identifying key OA-associated genes and provides new insights into the epigenetic mechanisms underlying OA. Full article

(This article belongs to the Section Molecular Genetics and Genomics)

► Show Figures

Figure 1

36 pages, 13078 KB

Open AccessArticle

Spatial Expansion and Driving Mechanisms of the Yangtze River Delta, Based on RF-RFECV Feature Selection and Night-Time Light Remote Sensing Data

by Dandan Shao, KyungJin Zoh and Huiyuan Liu

Remote Sens. 2026, 18(7), 1033; https://doi.org/10.3390/rs18071033 - 30 Mar 2026

Viewed by 435

Abstract

Rapid urbanization has promoted socioeconomic growth but has exacerbated spatial-structure imbalances. This study investigates 41 prefecture-level cities in the Yangtze River Delta (YRD) from 2010 to 2022. Using nighttime light data, we compute the Comprehensive Nighttime Light Index (CNLI) to track urbanization dynamics [...] Read more.

Rapid urbanization has promoted socioeconomic growth but has exacerbated spatial-structure imbalances. This study investigates 41 prefecture-level cities in the Yangtze River Delta (YRD) from 2010 to 2022. Using nighttime light data, we compute the Comprehensive Nighttime Light Index (CNLI) to track urbanization dynamics and delineate built-up areas. Furthermore, we apply random-forest recursive feature elimination with cross-validation (RF-RFECV) and a Shapley additive explanations (SHAP)-based interpretation framework to quantify the spatiotemporal evolution of urbanization drivers. The results indicate that urbanization in the YRD increased steadily overall during the study period. Shanghai maintained its core leadership, Jiangsu and Zhejiang advanced steadily, and Anhui rapidly caught up driven by regional integration policies. Although regional disparities generally converged, persistent absolute gaps in small and medium-sized cities and inland areas remain a prominent challenge to balanced development. Spatially, urbanization exhibits a gradient differentiation of “higher in the east and lower in the west, and higher along rivers and coasts than inland.” The regional spatial structure gradually shifted from an early “pole-core–belt” pattern to a polycentric and networked urban agglomeration system, with metropolitan areas and economic belts serving as important carriers for promoting spatial balance. Furthermore, built-up areas exhibit a trajectory of “core agglomeration, corridor-oriented expansion, and intensive transition.” The shrinking coverage of the standard deviational ellipse and a slowdown in expansion rates suggest a shift from extensive outward sprawl to more concentrated development. Regarding driving mechanisms, YRD urbanization has evolved from early-stage factor-scale expansion to a later-stage efficiency- and innovation-driven trajectory. While population density remained the dominant driver, early-stage reliance on transport infrastructure and fiscal decentralization was largely replaced by the strengthening effects of per capita output and green innovation. Overall, these findings provide empirical evidence for optimizing spatial patterns and designing differentiated policies for high-quality urbanization in the YRD. Full article

(This article belongs to the Special Issue Remote Sensing in Monitoring and Modelling the Patterns and Processes of Land System Change)

► Show Figures

Figure 1

25 pages, 2766 KB

Open AccessArticle

Towards Safer Automated Driving: Predicting Drivers with Long Takeover Time Using Random Forest and Human Factors

by Jungsook Kim and Ohyun Jo

Electronics 2026, 15(7), 1390; https://doi.org/10.3390/electronics15071390 - 26 Mar 2026

Viewed by 428

Abstract

In highly automated driving systems (ADSs), drivers’ ability to resume manual driving remains a road safety issue. However, to the best of our knowledge, there is no existing computational model to predict which drivers require more than the 4 seconds mandated by United [...] Read more.

In highly automated driving systems (ADSs), drivers’ ability to resume manual driving remains a road safety issue. However, to the best of our knowledge, there is no existing computational model to predict which drivers require more than the 4 seconds mandated by United Nations Regulation No. 157 to regain manual control. To address this challenge, we developed a Random Forest model that predicts takeover time using measurable human factors. Three controlled driving simulator experiments were conducted in which participants engaged in distinct tasks—texting, drinking, and traffic monitoring—before responding to a takeover request. During the experiments, we collected human factor features, including gaze behavior, age, and scores, from the self-reported driving behavior questionnaire (K-DBQ). The Random Forest classifier achieved 77% accuracy. Recursive feature elimination selected 10 dominant predictors; notably, engaging in non-driving-related tasks, reduced on-road gaze, and older age were significantly associated with longer takeover times. Although K-DBQ scores were not directly correlated with takeover time, their inclusion improved model robustness, consistent with ensemble learning from weak yet complementary signals. The proposed model can be integrated into advanced driver assistance systems (ADASs) to proactively identify drivers likely to exceed the 4-second takeover window, support targeted interventions, and enhance human-centered transition safety in ADSs. Full article

► Show Figures

Figure 1

15 pages, 8130 KB

Open AccessArticle

Integrative Machine Learning Framework for Epigenetic Biomarker Discovery and Disease Severity Prediction in Childhood Atopic Dermatitis

by Ding-Wei Chen and Yun-Nan Chang

Big Data Cogn. Comput. 2026, 10(4), 101; https://doi.org/10.3390/bdcc10040101 - 24 Mar 2026

Viewed by 398

Abstract

Atopic dermatitis (AD) is a chronic inflammatory skin disorder that is significantly contributed to by epigenetics. We developed a machine learning-based framework to identify DNA methylation biomarkers associated with AD classification and severity. Genome-wide methylation data from peripheral blood were processed using four [...] Read more.

Atopic dermatitis (AD) is a chronic inflammatory skin disorder that is significantly contributed to by epigenetics. We developed a machine learning-based framework to identify DNA methylation biomarkers associated with AD classification and severity. Genome-wide methylation data from peripheral blood were processed using four feature selection algorithms: coarse approximation linear function (CALF), elastic net (EN), minimum redundancy maximum relevance (mRMR), and recursive feature elimination with cross-validation (RFECV). The integrative framework identified a central panel of 8 CpG sites that achieved an area under the curve (AUC) of 1.00 in the test set. This panel demonstrated high disease specificity, showing poor classification performance for systemic lupus erythematosus (AUC = 0.46), Crohn’s disease (AUC = 0.50), and oral squamous cell carcinoma (AUC = 0.58). Severity prediction using RFECV-selected 63 CpG sites (RFE63) achieved high accuracy across classifiers, with Random Forest (accuracy = 0.94) outperforming the others. The functional enrichment of CpG-associated genes highlighted key immune-related transcriptional regulators, including STAT5A, RUNX1, MEIS1, and PAX4. These genes are linked to chromatin remodeling, T helper cell differentiation, and interleukin-2 regulation, which are critical in AD pathogenesis and severity. Our findings demonstrate the utility of machine learning-integrated epigenomics in identifying robust, disease-specific biomarkers for AD diagnosis and monitoring, offering new insights into the molecular mechanisms underlying childhood AD. However, further validation in large-scale independent cohorts is required to confirm their clinical robustness and generalizability. Full article

(This article belongs to the Special Issue Beyond Diagnosis: Machine Learning in Prognosis, Prevention, Healthcare, Neurosciences, and Precision Medicine)

► Show Figures

Figure 1

25 pages, 72089 KB

Open AccessArticle

Soil Salinity Assessment and Cross-Regional Validation Based on Multiple Feature Optimization Methods and SHAP

by Shuaishuai Shi, Yu Wang, Jiawen Wang, Jibang Yang, Zijin Bai and Jie Peng

Remote Sens. 2026, 18(6), 955; https://doi.org/10.3390/rs18060955 - 23 Mar 2026

Viewed by 468

Abstract

Soil salinity severely threatens global ecosystems and agriculture, making accurate monitoring an ongoing priority. Currently, efficiently utilizing multi-source datasets to enhance monitoring accuracy while minimizing computational resources remains a critical challenge. This study evaluated several modeling strategies, including full-dataset modeling, variance inflation factor [...] Read more.

Soil salinity severely threatens global ecosystems and agriculture, making accurate monitoring an ongoing priority. Currently, efficiently utilizing multi-source datasets to enhance monitoring accuracy while minimizing computational resources remains a critical challenge. This study evaluated several modeling strategies, including full-dataset modeling, variance inflation factor (VIF), Boruta, particle swarm optimization, ant colony optimization and recursive feature elimination (RFE), and validated results across diverse regions (Almaty, Kazakhstan; Shandong, China). We further validated the results using multiple algorithms, including linear regression, partial least squares regression, extreme gradient boosting, k-nearest neighbor and random forest (RF), with topsoil (0–20 cm) electrical conductivity inverted via the optimal method. Results indicate that input feature numbers substantially impact model performance: regional-scale feature selection is indispensable, with RFE outperforming full-dataset modeling (R² improves by up to 0.28, while RMSE decreases by 2.21 dS m⁻¹) and VIF performing the worst. Transferability is also demonstrated in Almaty and Shandong. Additionally, the RF algorithm shows superior performance in soil salinity mapping (overall accuracy = 0.73; kappa coefficient = 0.65). And, the RFE and SHAP results highlight CRSI, BI, and MSAVI2 as particularly important predictors for estimating soil salinity in our study area. Collectively, this study highlights the critical importance of feature optimization and interpretability in soil attribute mapping through the integration of multi-source remote sensing data. Full article

(This article belongs to the Special Issue Environmental Monitoring Based on Remote Sensing, Earth Observation and Geoinformation)

► Show Figures

Figure 1

23 pages, 3294 KB

Open AccessArticle

Evaluating Disturbance Regime Stratification for Aboveground Biomass Estimation in a Heterogeneous Forest Landscape: Insights from the Atewa Landscape, Ghana

by Lukman B. Adams and Yuichi S. Hayakawa

Remote Sens. 2026, 18(5), 765; https://doi.org/10.3390/rs18050765 - 3 Mar 2026

Viewed by 435

Abstract

Optical and passive remote sensing-based estimation of aboveground biomass (AGB) using forest structural stratification has shown improvements over global models. This study investigated whether stratification by human-mediated disturbances improves prediction accuracy. Disturbance variables included proximity to mines, roads, and settlements, evaluated across three [...] Read more.

Optical and passive remote sensing-based estimation of aboveground biomass (AGB) using forest structural stratification has shown improvements over global models. This study investigated whether stratification by human-mediated disturbances improves prediction accuracy. Disturbance variables included proximity to mines, roads, and settlements, evaluated across three regimes: the full Atewa landscape (“FSR”), the Atewa Range Forest Reserve (“FR”), and the surrounding disturbed area (“SR”). Predictor selection for regimes was performed using recursive feature elimination with cross-validation, applied to random forest (RF) and support vector machine (SVM) algorithms. AGB was then estimated using local, global, and retuned global models, and the results were compared using the coefficient of determination (r²) and root mean square error (RMSE). The global RF model achieved the best performance (r² = 0.54; RMSE = 57.71 Mg/ha), likely due to structured heterogeneity captured across combined regimes. The “SR” models, however, performed poorly, indicating that excessive unstructured heterogeneity introduces noise and redundancy that weaken predictions. The low performance of the “FR” regime was attributed to spectral saturation and limited variance in observed AGB. Although disturbance factors added minimal bias, heteroscedasticity was evident in the “SR” and “FSR” regimes. Overall, this study indicates that disturbance-based stratification may not necessarily improve AGB estimation accurately compared to global models. However, it highlights the value of disturbance information for AGB modeling in heterogeneous forest landscapes. Full article

(This article belongs to the Section Forest Remote Sensing)

► Show Figures

Figure 1

42 pages, 6154 KB

Open AccessFeature PaperArticle

A Novel Hybrid Opcode Feature Selection Framework for Efficient and Effective IoT Malware Detection

by Bakhan Tofiq Ahmed, Noor Ghazi M. Jameel and Bakhtiar Ibrahim Saeed

IoT 2026, 7(1), 24; https://doi.org/10.3390/iot7010024 - 2 Mar 2026

Viewed by 604

Abstract

Malware’s proliferation in the Internet of Things (IoT) ecosystem requires precise, efficient detection systems capable of operating on IoT devices. Existing static analysis approaches often fail due to computational inefficiency stemming from high feature dimensionality inherent in raw opcode features. This research addresses [...] Read more.

Malware’s proliferation in the Internet of Things (IoT) ecosystem requires precise, efficient detection systems capable of operating on IoT devices. Existing static analysis approaches often fail due to computational inefficiency stemming from high feature dimensionality inherent in raw opcode features. This research addresses this limitation by proposing a novel machine-learning (ML)-driven Intelligent Hybrid Feature Selection (IHFS) framework with two distinct architectures. IHFS1 combines a filter method (variance threshold) with an embedded method (LGBM feature importance). Conversely, IHFS2 integrates variance thresholding with a wrapper method (Recursive Feature Elimination with Cross-Validation using LGBM) for optimal selection. This framework is specifically designed to select an optimally stable and minimal feature subset from the initial 1183 opcode frequency vector extracted from ARM binaries. Applying this framework to a multi-family IoT malware dataset, the IHFS architectures yielded distinct and highly efficient feature subsets: IHFS1 achieved a 95.77% reduction (to 50 features), while IHFS2 attained a 98.06% reduction (to 23 features). Evaluation across eight ML models confirmed that the Random Forest (with IHFS1 subset) and Decision Tree (with IHFS2 subset) classifiers were the best performing, achieving robust classification metrics that outperform current state-of-the-art solutions. The Decision Tree model demonstrated exceptional detection capabilities, with an accuracy of 99.87%, a precision of 99.82%, a recall of 99.88%, and an F1-score of 99.85%. It achieved an average inference time of 0.058 ms per sample. Experimental results attained on a native ARM64 environment validate the deployment feasibility of the proposed system for resource-constrained IoT devices, such as the Raspberry Pi. The proposed system achieves a high-throughput, low-overhead security posture while maintaining host operational stability, processing a single ELF binary in just 3.431 ms. Full article

(This article belongs to the Special Issue Cybersecurity in the Age of the Internet of Things)

► Show Figures

Figure 1

42 pages, 1422 KB

Open AccessArticle

Exploring Handwriting-Based Biomarkers for Alzheimer’s Disease: Identifying Discriminative Features and Tasks to Enhance Diagnostic Accuracy

by Cansu Akyürek Anacur, Asuman Günay Yılmaz and Bekir Dizdaroğlu

Diagnostics 2026, 16(5), 697; https://doi.org/10.3390/diagnostics16050697 - 26 Feb 2026

Viewed by 428

Abstract

Background/Objectives: This study proposes a comprehensive classification framework for the automatic detection of Alzheimer’s disease using handwriting data. An enriched feature space is constructed by combining 18 baseline features extracted from raw handwriting signals with 30 additional features derived from established handwriting analysis [...] Read more.

Background/Objectives: This study proposes a comprehensive classification framework for the automatic detection of Alzheimer’s disease using handwriting data. An enriched feature space is constructed by combining 18 baseline features extracted from raw handwriting signals with 30 additional features derived from established handwriting analysis studies, resulting in a total of 48 features. To enhance clinical practicality, a task reduction analysis is conducted by comparing the full dataset containing 25 handwriting tasks with a reduced dataset comprising 14 selected tasks. Methods: The proposed framework employs a two-stage evaluation strategy involving four feature selection methods (Random Forest Feature Importance, Extreme Gradient Boosting Feature Importance, L1 Regularization and Recursive Feature Elimination), three normalization techniques (Unnormalized, Min–Max and Z-Score), and five baseline machine learning classifiers (Random Forest, Logistic Regression, Multilayer Perceptron, XGBoost and Support Vector Machines). In the second stage, a dynamic ensemble learning strategy is introduced, where the most effective classifiers are adaptively selected for each cross-validation fold and integrated using soft and hard voting schemes. Results: The experimental results demonstrate that reducing the number of tasks leads to an improvement in average classification accuracy from 79.47% to 81.03%, while simultaneously decreasing training time and memory consumption by approximately 40% and 35%, respectively. The highest classification performance, achieving an accuracy of 94.20%, is obtained using the Hard Ensemble combined with L1-based feature selection. Conclusions: These findings highlight that the joint use of enriched feature representations, task reduction, and dynamic ensemble learning provides an effective and computationally efficient solution for handwriting-based Alzheimer’s disease detection. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

30 pages, 1973 KB

Open AccessArticle

Human-Centered AI Perception Prediction in Construction: A Regularized Machine Learning Approach for Industry 5.0

by Annamária Behúnová, Matúš Pohorenec, Tomáš Mandičák and Marcel Behún

Appl. Sci. 2026, 16(4), 2057; https://doi.org/10.3390/app16042057 - 19 Feb 2026

Viewed by 543

Abstract

Industry 5.0 emphasizes human-centered integration of artificial intelligence in industrial contexts, yet successful adoption depends critically on workforce perception and acceptance. This research develops and validates a machine learning framework for predicting AI-related perceptions and expected impacts in the construction industry under small [...] Read more.

Industry 5.0 emphasizes human-centered integration of artificial intelligence in industrial contexts, yet successful adoption depends critically on workforce perception and acceptance. This research develops and validates a machine learning framework for predicting AI-related perceptions and expected impacts in the construction industry under small sample constraints typical of specialized industrial surveys. Specifically, the study aims to develop and empirically validate a predictive AI decision support model that estimates the expected impact of AI adoption in the construction sector based on digital competencies, ICT utilization, AI training and experience, and AI usage at both individual and organizational levels, operationalized through a composite AI Impact Index and two process-oriented outcomes (perceived task automation and perceived cost reduction). Using a dataset of 51 survey responses from Slovak construction professionals collected in 2025, we implement a methodologically rigorous approach specifically designed for limited-data regimes. The framework encompasses ordinal target simplification from five to three classes, dimensionality reduction through theoretically grounded composite indices reducing features from 15 to 7, exclusive deployment of low variance regularized models, and leave-one-out cross-validation for unbiased performance estimation. The optimal model (Lasso regression with recursive feature elimination) predicts cost reduction perception with R² = 0.501, MAE = 0.551, and RMSE = 0.709, while six classification targets achieve weighted F1 = 0.681, representing statistically optimal performance given sample constraints and perception measurement variability. Comparative evaluation confirms regularized models outperform high variance alternatives: random forest (R² = 0.412) and gradient boosting (R² = 0.292) exhibit substantially lower generalization performance, empirically validating the bias-variance trade-off rationale. Key methodological contributions include explicit bias-variance optimization preventing overfitting, feature selection via RFE reducing input space to six predictors (personal AI usage, AI impact on budgeting, ICT utilization, AI training, company size, and age), and demonstration that principled statistical approaches achieve meaningful predictions without requiring large-scale datasets or complex architectures. The framework provides a replicable blueprint for perception and impact prediction in data-constrained Industry 5.0 contexts, enabling targeted interventions, including customized training programs, strategic communication prioritization, and resource allocation for change management initiatives aligned with predicted adoption patterns. Full article

(This article belongs to the Special Issue The Wide Range Use of Innovative Technologies in Industry 4.0/5.0 and IoT)

► Show Figures

Figure 1

24 pages, 14077 KB

Open AccessArticle

Efficient and Interpretable Machine Learning for Student Academic Outcome Prediction

by Hongwen Gu and Yuqi Zhang

Mathematics 2026, 14(4), 626; https://doi.org/10.3390/math14040626 - 11 Feb 2026

Cited by 1 | Viewed by 682

Abstract

Understanding and preventing student dropout presents a decision-critical modeling problem involving heterogeneous variables, nonlinear relationships, and the need for transparent inference. This study addresses the prediction of undergraduate academic outcomes, including Graduation, Enrolled, and Dropout, by proposing a efficientand interpretable machine learning framework [...] Read more.

Understanding and preventing student dropout presents a decision-critical modeling problem involving heterogeneous variables, nonlinear relationships, and the need for transparent inference. This study addresses the prediction of undergraduate academic outcomes, including Graduation, Enrolled, and Dropout, by proposing a efficientand interpretable machine learning framework that explicitly balances predictive performance, feature efficiency, and algorithmic explainability. The empirical analysis relies on a dataset of 4424 student records across 17 undergraduate programs from the Polytechnic Institute of Portalegre, Portugal. In contrast to existing approaches that rely on high-dimensional input spaces and opaque predictive architectures, we develop a reduced-dimensional classification pipeline based on recursive feature elimination with Gradient Boosting and Random Forest models. Starting from a comprehensive set of demographic, academic, and financial indicators, only 20 informative predictors are retained for model construction, substantially reducing input complexity while preserving predictive capacity. Comparative evaluation across multiple learning algorithms identifies Gradient Boosting as the most effective model, achieving an AUC of 0.891. Beyond predictive accuracy, the proposed framework emphasizes model interpretability through the integration of SHapley Additive exPlanations (SHAP), enabling quantitative attribution of feature contributions at both global and instance levels. The analysis reveals that second-semester academic engagement variables—including the number of courses approved, evaluated, and enrolled—as well as tuition fee payment status and age at enrollment, are the dominant factors shaping student outcomes. Overall, the results demonstrate that strong classification performance can be achieved using a compact feature set while maintaining transparent and explainable model behavior. By combining mathematically grounded feature selection with principled model explanation, this study advances methodological understanding of how efficiency, interpretability, and predictive accuracy can be jointly optimized in applied machine learning, with implications for decision-support systems in educational analytics. Full article

(This article belongs to the Special Issue Applied Mathematics, Computing, and Machine Learning)

► Show Figures

Figure 1

23 pages, 3417 KB

Open AccessArticle

The Main Control Factors and Productivity Evaluation Method of Stimulated Well Production Based on an Interpretable Machine Learning Model

by Jin Li, Huiqing Liu, Lin Yan, Zhiping Wang, Hongliang Wang, Shaojun Wang, Xue Qin and Hui Feng

Energies 2026, 19(2), 548; https://doi.org/10.3390/en19020548 - 21 Jan 2026

Viewed by 252

Abstract

Low-permeability waterflooding reservoirs face numerous challenges, including low productivity per well, inadequate formation pressure maintenance, poor waterflood response, and low water injection utilization efficiency. Illustrated by Bai 153 Block in the Changqing Oilfield, the primary concern has shifted in recent years from fracture [...] Read more.

Low-permeability waterflooding reservoirs face numerous challenges, including low productivity per well, inadequate formation pressure maintenance, poor waterflood response, and low water injection utilization efficiency. Illustrated by Bai 153 Block in the Changqing Oilfield, the primary concern has shifted in recent years from fracture water breakthrough to formation blockages. Currently, low-yield wells (≤0.5 t) constitute a significant proportion (27.5%), with a recovery factor of only 0.41%. The effectiveness of stimulation treatments is influenced by reservoir properties, treatment types, process parameters, and production performance. Selecting candidate wells requires collecting and analyzing data such as individual well block characteristics. Evaluating treatment effectiveness involves substantial effort and complexity. Early fracturing treatments exhibited significant variations in effectiveness, and the primary controlling factors influencing fracturing success remained unclear. This paper proposes a big data analysis-based method for evaluating stimulation effectiveness in low-permeability waterflooding reservoirs. Utilizing preprocessed geological, construction, and production data from the target block, an integrated application of the Random Forest algorithm and Recursive Feature Elimination ranks the importance of factors affecting treatments and identifies the block’s main controlling factors. Using these factors as target parameters, a multivariate quantitative evaluation model for fracturing effectiveness is established. This model employs the Pearson correlation coefficient method, Recursive Feature Elimination, and the Random Forest algorithm. Results from the quantitative model indicate that the primary main controlling factors that significantly affect post-fracturing oil increment are production parameters, geological parameters such as vertical thickness, fracture pressure, and oil saturation; engineering parameters such as sand ratio, blowout volume, and fracturing method; and production parameters such as pre-measure cumulative fluid production, production months, and pre-measure cumulative oil production, which are most closely related to post-fracturing oil increment. These parameters show the strongest correlation with incremental oil production. The constructed quantitative model demonstrates a linear correlation rate exceeding 85% between predicted fracturing stimulation and actual well test production, verifying its validity. This approach provides a novel method and theoretical foundation for the post-evaluation of oil increment effectiveness from stimulation treatments in low-permeability waterflooding reservoirs. Full article

(This article belongs to the Special Issue Oil and Gas Reservoirs: Phase Behavior, Seepage Mechanisms, Productivity Prediction, and Novel Modelling Methods—3rd Edition)

► Show Figures

Figure 1

Search Results (302)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (302)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI