MDPI - Publisher of Open Access Journals

24 pages, 1508 KiB

Open AccessArticle

Genomic Prediction of Adaptation in Common Bean (Phaseolus vulgaris L.) × Tepary Bean (P. acutifolius A. Gray) Hybrids

by Felipe López-Hernández, Diego F. Villanueva-Mejía, Adriana Patricia Tofiño-Rivera and Andrés J. Cortés

Int. J. Mol. Sci. 2025, 26(15), 7370; https://doi.org/10.3390/ijms26157370 - 30 Jul 2025

Viewed by 193

Abstract

Climate change is jeopardizing global food security, with at least 713 million people facing hunger. To face this challenge, legumes as common beans could offer a nature-based solution, sourcing nutrients and dietary fiber, especially for rural communities in Latin America and Africa. However, [...] Read more.

Climate change is jeopardizing global food security, with at least 713 million people facing hunger. To face this challenge, legumes as common beans could offer a nature-based solution, sourcing nutrients and dietary fiber, especially for rural communities in Latin America and Africa. However, since common beans are generally heat and drought susceptible, it is imperative to speed up their molecular introgressive adaptive breeding so that they can be cultivated in regions affected by extreme weather. Therefore, this study aimed to couple an advanced panel of common bean (Phaseolus vulgaris L.) × tolerant Tepary bean (P. acutifolius A. Gray) interspecific lines with Bayesian regression algorithms to forecast adaptation to the humid and dry sub-regions at the Caribbean coast of Colombia, where the common bean typically exhibits maladaptation to extreme heat waves. A total of 87 advanced lines with hybrid ancestries were successfully bred, surpassing the interspecific incompatibilities. This hybrid panel was genotyped by sequencing (GBS), leading to the discovery of 15,645 single-nucleotide polymorphism (SNP) markers. Three yield components (yield per plant, and number of seeds and pods) and two biomass variables (vegetative and seed biomass) were recorded for each genotype and inputted in several Bayesian regression models to identify the top genotypes with the best genetic breeding values across three localities on the Colombian coast. We comparatively analyzed several regression approaches, and the model with the best performance for all traits and localities was BayesC. Also, we compared the utilization of all markers and only those determined as associated by a priori genome-wide association studies (GWAS) models. Better prediction ability with the complete SNP set was indicative of missing heritability as part of GWAS reconstructions. Furthermore, optimal SNP sets per trait and locality were determined as per the top 500 most explicative markers according to their β regression effects. These 500 SNPs, on average, overlapped in 5.24% across localities, which reinforced the locality-dependent nature of polygenic adaptation. Finally, we retrieved the genomic estimated breeding values (GEBVs) and selected the top 10 genotypes for each trait and locality as part of a recommendation scheme targeting narrow adaption in the Caribbean. After validation in field conditions and for screening stability, candidate genotypes and SNPs may be used in further introgressive breeding cycles for adaptation. Full article

(This article belongs to the Special Issue Plant Breeding and Genetics: New Findings and Perspectives)

► Show Figures

Figure 1

22 pages, 12611 KiB

Open AccessArticle

Banana Fusarium Wilt Recognition Based on UAV Multi-Spectral Imagery and Automatically Constructed Enhanced Features

by Ye Su, Longlong Zhao, Huichun Ye, Wenjiang Huang, Xiaoli Li, Hongzhong Li, Jinsong Chen, Weiping Kong and Biyao Zhang

Agronomy 2025, 15(8), 1837; https://doi.org/10.3390/agronomy15081837 - 29 Jul 2025

Viewed by 102

Abstract

Banana Fusarium wilt (BFW, also known as Panama disease) is a highly infectious and destructive disease that threatens global banana production, requiring early recognition for timely prevention and control. Current monitoring methods primarily rely on continuous variable features—such as band reflectances (BRs) and [...] Read more.

Banana Fusarium wilt (BFW, also known as Panama disease) is a highly infectious and destructive disease that threatens global banana production, requiring early recognition for timely prevention and control. Current monitoring methods primarily rely on continuous variable features—such as band reflectances (BRs) and vegetation indices (VIs)—collectively referred to as basic features (BFs)—which are prone to noise during the early stages of infection and struggle to capture subtle spectral variations, thus limiting the recognition accuracy. To address this limitation, this study proposes a discretized enhanced feature (EF) construction method, the automated kernel density segmentation-based feature construction algorithm (AutoKDFC). By analyzing the differences in the kernel density distributions between healthy and diseased samples, the AutoKDFC automatically determines the optimal segmentation threshold, converting continuous BFs into binary features with higher discriminative power for early-stage recognition. Using UAV-based multi-spectral imagery, BFW recognition models are developed and tested with the random forest (RF), support vector machine (SVM), and Gaussian naïve Bayes (GNB) algorithms. The results show that EFs exhibit significantly stronger correlations with BFW’s presence than original BFs. Feature importance analysis via RF further confirms that EFs contribute more to the model performance, with VI-derived features outperforming BR-based ones. The integration of EFs results in average performance gains of 0.88%, 2.61%, and 3.07% for RF, SVM, and GNB, respectively, with SVM achieving the best performance, averaging over 90%. Additionally, the generated BFW distribution map closely aligns with ground observations and captures spectral changes linked to disease progression, validating the method’s practical utility. Overall, the proposed AutoKDFC method demonstrates high effectiveness and generalizability for BFW recognition. Its core concept of “automatic feature enhancement” has strong potential for broader applications in crop disease monitoring and supports the development of intelligent early warning systems in plant health management. Full article

(This article belongs to the Section Pest and Disease Management)

► Show Figures

Figure 1

20 pages, 7640 KiB

Open AccessArticle

Land Cover Mapping Using High-Resolution Satellite Imagery and a Comparative Machine Learning Approach to Enhance Regional Water Resource Management

by János Tamás, Angura Louis, Zsolt Zoltán Fehér and Attila Nagy

Remote Sens. 2025, 17(15), 2591; https://doi.org/10.3390/rs17152591 - 25 Jul 2025

Viewed by 239

Abstract

Accurate land cover classification is vital for informed water resource management, especially in irrigation-dependent regions facing increased climate variability. Using fused multi-sensor remote sensing imagery from Landsat 8 and Sentinel-2, this study assesses the effectiveness of three machine learning classifiers: Random Forest (RF), [...] Read more.

Accurate land cover classification is vital for informed water resource management, especially in irrigation-dependent regions facing increased climate variability. Using fused multi-sensor remote sensing imagery from Landsat 8 and Sentinel-2, this study assesses the effectiveness of three machine learning classifiers: Random Forest (RF), Gradient Tree Boosting (GTB), and Naive Bayes (NB) in creating land cover maps for the Tisza-Körös Valley Irrigation System (TIKEVIR) in Hungary. Water bodies, built-up areas, forests, grasslands, and major crops were among the important land cover categories that were classified for the two agricultural seasons (2018 and 2022). RF performed consistently in 2022 and reached its best accuracy in 2018 (OA = 0.87, KC = 0.83, PI = 0.94). While NB’s performance in 2022 remained less consistent, GTB’s performance increased. The findings show that RF works effectively for generating accurate land cover data, providing useful information for regional monitoring, and assisting in water and environmental management decision-making. Full article

(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)

► Show Figures

Figure 1

24 pages, 1572 KiB

Open AccessArticle

Optimizing DNA Sequence Classification via a Deep Learning Hybrid of LSTM and CNN Architecture

by Elias Tabane, Ernest Mnkandla and Zenghui Wang

Appl. Sci. 2025, 15(15), 8225; https://doi.org/10.3390/app15158225 - 24 Jul 2025

Viewed by 213

Abstract

This study addresses the performance of deep learning models for predicting human DNA sequence classification through an exploration of ideal feature representation, model architecture, and hyperparameter tuning. It contrasts traditional machine learning with advanced deep learning approaches to ascertain performance with respect to [...] Read more.

This study addresses the performance of deep learning models for predicting human DNA sequence classification through an exploration of ideal feature representation, model architecture, and hyperparameter tuning. It contrasts traditional machine learning with advanced deep learning approaches to ascertain performance with respect to genomic data complexity. A hybrid network combining long short-term memory (LSTM) and convolutional neural networks (CNN) was developed to extract long-distance dependencies as well as local patterns from DNA sequences. The hybrid LSTM + CNN model achieved a classification accuracy of 100%, which is significantly higher than traditional approaches such as logistic regression (45.31%), naïve Bayes (17.80%), and random forest (69.89%), as well as other machine learning models such as XGBoost (81.50%) and k-nearest neighbor (70.77%). Among deep learning techniques, the DeepSea model also accounted for good performance (76.59%), while others like DeepVariant (67.00%) and graph neural networks (30.71%) were relatively lower. Preprocessing techniques, one-hot encoding, and DNA embeddings were mainly at the forefront of transforming sequence data to a compatible form for deep learning. The findings underscore the robustness of hybrid structures in genomic classification tasks and warrant future research on encoding strategy, model and parameter tuning, and hyperparameter tuning to further improve accuracy and generalization in DNA sequence analysis. Full article

(This article belongs to the Special Issue Advances in Deep Learning for Complex Combinatorial Optimization: Applications in Cybersecurity, Healthcare, and Intelligent Systems)

► Show Figures

Figure 1

26 pages, 2658 KiB

Open AccessArticle

An Efficient and Accurate Random Forest Node-Splitting Algorithm Based on Dynamic Bayesian Methods

by Jun He, Zhanqi Li and Linzi Yin

Mach. Learn. Knowl. Extr. 2025, 7(3), 70; https://doi.org/10.3390/make7030070 - 21 Jul 2025

Viewed by 241

Abstract

Random Forests are powerful machine learning models widely applied in classification and regression tasks due to their robust predictive performance. Nevertheless, traditional Random Forests face computational challenges during tree construction, particularly in high-dimensional data or on resource-constrained devices. In this paper, a novel [...] Read more.

Random Forests are powerful machine learning models widely applied in classification and regression tasks due to their robust predictive performance. Nevertheless, traditional Random Forests face computational challenges during tree construction, particularly in high-dimensional data or on resource-constrained devices. In this paper, a novel node-splitting algorithm, BayesSplit, is proposed to accelerate decision tree construction via a Bayesian-based impurity estimation framework. BayesSplit treats impurity reduction as a Bernoulli event with Beta-conjugate priors for each split point and incorporates two main strategies. First, Dynamic Posterior Parameter Refinement updates the Beta parameters based on observed impurity reductions in batch iterations. Second, Posterior-Derived Confidence Bounding establishes statistical confidence intervals, efficiently filtering out suboptimal splits. Theoretical analysis demonstrates that BayesSplit converges to optimal splits with high probability, while experimental results show up to a 95% reduction in training time compared to baselines and maintains or exceeds generalization performance. Compared to the state-of-the-art MABSplit, BayesSplit achieves similar accuracy on classification tasks and reduces regression training time by 20–70% with lower MSEs. Furthermore, BayesSplit enhances feature importance stability by up to 40%, making it particularly suitable for deployment in computationally constrained environments. Full article

► Show Figures

Figure 1

13 pages, 272 KiB

Open AccessFeature PaperArticle

Asymptotic Behavior of the Bayes Estimator of a Regression Curve

by Agustín G. Nogales

Mathematics 2025, 13(14), 2319; https://doi.org/10.3390/math13142319 - 21 Jul 2025

Viewed by 138

Abstract

In this work, we prove the convergence to 0 in both

L^{1}

and

L^{2}

of the Bayes estimator of a regression curve (i.e., the conditional expectation of the response variable given the regressor). The strong consistency of the estimator is also [...] Read more.

In this work, we prove the convergence to 0 in both

L^{1}

and

L^{2}

of the Bayes estimator of a regression curve (i.e., the conditional expectation of the response variable given the regressor). The strong consistency of the estimator is also derived. The Bayes estimator of a regression curve is the regression curve with respect to the posterior predictive distribution. The result is general enough to cover discrete and continuous cases, parametric or nonparametric, and no specific supposition is made about the prior distribution. Some examples, two of them of a nonparametric nature, are given to illustrate the main result; one of the nonparametric examples exhibits a situation where the estimation of the regression curve has an optimal solution, although the problem of estimating the density is meaningless. An important role in the demonstration of these results is the establishment of a probability space as an adequate framework to address the problem of estimating regression curves from the Bayesian point of view, putting at our disposal powerful probabilistic tools in that endeavor. Full article

(This article belongs to the Section D1: Probability and Statistics)

29 pages, 4788 KiB

Open AccessArticle

Statistical and Machine Learning Classification Approaches to Predicting and Controlling Peak Temperatures During Friction Stir Welding (FSW) of Al-6061-T6 Alloys

by Assad Anis, Muhammad Shakaib and Muhammad Sohail Hanif

J. Manuf. Mater. Process. 2025, 9(7), 246; https://doi.org/10.3390/jmmp9070246 - 21 Jul 2025

Viewed by 300

Abstract

This paper presents optimization of peak temperatures achieved during friction stir welding (FSW) of Al-6061-T6 alloys. This research work employed a novel approach by investigating the effect of FSW welding process parameters on peak temperatures through the implementation of finite element analysis (FEA), [...] Read more.

This paper presents optimization of peak temperatures achieved during friction stir welding (FSW) of Al-6061-T6 alloys. This research work employed a novel approach by investigating the effect of FSW welding process parameters on peak temperatures through the implementation of finite element analysis (FEA), the Taguchi method, analysis of variance (ANOVA), and machine learning (ML) algorithms. COMSOL 6.0 Multiphysics was used to perform FEA to predict peak temperatures, incorporating seven distinctive welding parameters: tool material, pin diameter, shoulder diameter, tool rotational speed, welding speed, axial force, and coefficient of friction. The influence of these parameters was investigated using an L32 Taguchi array and analysis of variance (ANOVA), revealing that axial force and tool rotational speed were the most significant parameters affecting peak temperatures. Some simulations showed temperatures exceeding the material’s melting point, indicating the need for improved thermal control. This was achieved by using three machine learning (ML) algorithms, i.e., Logistic Regression, k-Nearest Neighbors (k-NN), and Naive Bayes. A dataset of 324 data points was prepared using a factorial design to implement these algorithms. These algorithms predicted the welding conditions where the temperature exceeded the melting temperature of Al-6061-T6. It was found that the Logistic Regression classifier demonstrated the highest performance, achieving an accuracy of 98.14% as compared to Naive Bayes and k-NN classifiers. These findings contribute to sustainable welding practices by minimizing excessive heat generation, preserving material properties, and enhancing weld quality. Full article

► Show Figures

Figure 1

16 pages, 1037 KiB

Open AccessArticle

Generative Learning from Semantically Confused Label Distribution via Auto-Encoding Variational Bayes

by Xinhai Li, Chenxu Meng, Heng Zhou, Yi Guo, Bowen Xue, Tianzuo Yu and Yunan Lu

Electronics 2025, 14(13), 2736; https://doi.org/10.3390/electronics14132736 - 7 Jul 2025

Viewed by 213

Abstract

Label Distribution Learning (LDL) has emerged as a powerful paradigm for addressing label ambiguity, offering a more nuanced quantification of the instance–label relationship compared to traditional single-label and multi-label learning approaches. This paper focuses on the challenge of noisy label distributions, which is [...] Read more.

Label Distribution Learning (LDL) has emerged as a powerful paradigm for addressing label ambiguity, offering a more nuanced quantification of the instance–label relationship compared to traditional single-label and multi-label learning approaches. This paper focuses on the challenge of noisy label distributions, which is ubiquitous in real-world applications due to the annotator subjectivity, algorithmic biases, and experimental errors. Existing related LDL algorithms often assume a linear combination of true and random label distributions when modeling the noisy label distributions, an oversimplification that fails to capture the practical generation processes of noisy label distributions. Therefore, this paper introduces an assumption that the noise in label distributions primarily arises from the semantic confusion between labels and proposes a novel generative label distribution learning algorithm to model the confusion-based generation process of both the feature data and the noisy label distribution data. The proposed model is inferred using variational methods and its effectiveness is demonstrated through extensive experiments across various real-world datasets, showcasing its superiority in handling noisy label distributions. Full article

(This article belongs to the Special Issue Neural Networks: From Software to Hardware)

► Show Figures

Graphical abstract

16 pages, 662 KiB

Open AccessArticle

Augmenting Naïve Bayes Classifiers with k-Tree Topology

by Fereshteh R. Dastjerdi and Liming Cai

Mathematics 2025, 13(13), 2185; https://doi.org/10.3390/math13132185 - 4 Jul 2025

Viewed by 267

Abstract

The Bayesian network is a directed, acyclic graphical model that can offer a structured description for probabilistic dependencies among random variables. As powerful tools for classification tasks, Bayesian classifiers often require computing joint probability distributions, which can be computationally intractable due to potential [...] Read more.

The Bayesian network is a directed, acyclic graphical model that can offer a structured description for probabilistic dependencies among random variables. As powerful tools for classification tasks, Bayesian classifiers often require computing joint probability distributions, which can be computationally intractable due to potential full dependencies among feature variables. On the other hand, Naïve Bayes, which presumes zero dependencies among features, trades accuracy for efficiency and often comes with underperformance. As a result, non-zero dependency structures, such as trees, are often used as more feasible probabilistic graph approximations; in particular, Tree Augmented Naïve Bayes (TAN) has been demonstrated to outperform Naïve Bayes and has become a popular choice. For applications where a variable is strongly influenced by multiple other features, TAN has been further extended to the k-dependency Bayesian classifier (KDB), where one feature can depend on up to k other features (for a given

k \geq 2

). In such cases, however, the selection of the k parent features for each variable is often made through heuristic search methods (such as sorting), which do not guarantee an optimal approximation of network topology. In this paper, the novel notion of k-tree Augmented Naïve Bayes (k-TAN) is introduced to augment Naïve Bayesian classifiers with k-tree topology as an approximation of Bayesian networks. It is proved that, under the Kullback–Leibler divergence measurement, k-tree topology approximation of Bayesian classifiers loses the minimum information with the topology of a maximum spanning k-tree, where the edge weights of the graph are mutual information between random variables conditional upon the class label. In addition, while in general finding a maximum spanning k-tree is NP-hard for fixed

k \geq 2

, this work shows that the approximation problem can be solved in time

O (n^{k + 1})

if the spanning k-tree also desires to retain a given Hamiltonian path in the graph. Therefore, this algorithm can be employed to ensure efficient approximation of Bayesian networks with k-tree augmented Naïve Bayesian classifiers of the guaranteed minimum loss of information. Full article

► Show Figures

Figure 1

17 pages, 572 KiB

Open AccessArticle

Statistical Analysis Under a Random Censoring Scheme with Applications

by Mustafa M. Hasaballah and Mahmoud M. Abdelwahab

Symmetry 2025, 17(7), 1048; https://doi.org/10.3390/sym17071048 - 3 Jul 2025

Cited by 1 | Viewed by 251

Abstract

The Gumbel Type-II distribution is a widely recognized and frequently utilized lifetime distribution, playing a crucial role in reliability engineering. This paper focuses on the statistical inference of the Gumbel Type-II distribution under a random censoring scheme. From a frequentist perspective, point estimates [...] Read more.

The Gumbel Type-II distribution is a widely recognized and frequently utilized lifetime distribution, playing a crucial role in reliability engineering. This paper focuses on the statistical inference of the Gumbel Type-II distribution under a random censoring scheme. From a frequentist perspective, point estimates for the unknown parameters are derived using the maximum likelihood estimation method, and confidence intervals are constructed based on the Fisher information matrix. From a Bayesian perspective, Bayes estimates of the parameters are obtained using the Markov Chain Monte Carlo method, and the average lengths of credible intervals are calculated. The Bayesian inference is performed under both the squared error loss function and the general entropy loss function. Additionally, a numerical simulation is conducted to evaluate the performance of the proposed methods. To demonstrate their practical applicability, a real world example is provided, illustrating the application and development of these inference techniques. In conclusion, the Bayesian method appears to outperform other approaches, although each method offers unique advantages. Full article

(This article belongs to the Special Issue Skewed (Asymmetrical) Probability Distributions and Applications Across Disciplines, Fourth Edition)

► Show Figures

Figure 1

21 pages, 5516 KiB

Open AccessFeature PaperArticle

Hyperspectral Imaging for Non-Destructive Moisture Prediction in Oat Seeds

by Peng Zhang and Jiangping Liu

Agriculture 2025, 15(13), 1341; https://doi.org/10.3390/agriculture15131341 - 22 Jun 2025

Viewed by 461

Abstract

Oat is a highly nutritious cereal crop, and the moisture content of its seeds plays a vital role in cultivation management, storage preservation, and quality control. To enable efficient and non-destructive prediction of this key quality parameter, this study presents a modeling framework [...] Read more.

Oat is a highly nutritious cereal crop, and the moisture content of its seeds plays a vital role in cultivation management, storage preservation, and quality control. To enable efficient and non-destructive prediction of this key quality parameter, this study presents a modeling framework integrating hyperspectral imaging (HSI) technology with a dual-optimization machine learning strategy. Seven spectral preprocessing techniques—standard normal variate (SNV), multiplicative scatter correction (MSC), first derivative (FD), second derivative (SD), and combinations such as SNV + FD, SNV + SD, and SNV + MSC—were systematically evaluated. Among them, SNV combined with FD was identified as the optimal preprocessing scheme, effectively enhancing spectral feature expression. To further refine the predictive model, three feature selection methods—successive projections algorithm (SPA), competitive adaptive reweighted sampling (CARS), and principal component analysis (PCA)—were assessed. PCA exhibited superior performance in information compression and modeling stability. Subsequently, a dual-optimized neural network model, termed Bayes-ASFSSA-BP, was developed by incorporating Bayesian optimization and the Adaptive Spiral Flight Sparrow Search Algorithm (ASFSSA). Bayesian optimization was used for global tuning of network structural parameters, while ASFSSA was applied to fine-tune the initial weights and thresholds, improving convergence efficiency and predictive accuracy. The proposed Bayes-ASFSSA-BP model achieved determination coefficients (R²) of 0.982 and 0.963, and root mean square errors (RMSEs) of 0.173 and 0.188 on the training and test sets, respectively. The corresponding mean absolute error (MAE) on the test set was 0.170, indicating excellent average prediction accuracy. These results significantly outperformed benchmark models such as SSA-BP, ASFSSA-BP, and Bayes-BP. Compared to the conventional BP model, the proposed approach increased the test R² by 0.046 and reduced the RMSE by 0.157. Moreover, the model produced the narrowest 95% confidence intervals for test set performance (Rp²: [0.961, 0.971]; RMSE: [0.185, 0.193]), demonstrating outstanding robustness and generalization capability. Although the model incurred a slightly higher computational cost (480.9 s), the accuracy gain was deemed worthwhile. In conclusion, the proposed Bayes-ASFSSA-BP framework shows strong potential for accurate and stable non-destructive prediction of oat seed moisture content. This work provides a practical and efficient solution for quality assessment in agricultural products and highlights the promise of integrating Bayesian optimization with ASFSSA in modeling high-dimensional spectral data. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

18 pages, 839 KiB

Open AccessArticle

From Narratives to Diagnosis: A Machine Learning Framework for Classifying Sleep Disorders in Aging Populations: The sleepCare Platform

by Christos A. Frantzidis

Brain Sci. 2025, 15(7), 667; https://doi.org/10.3390/brainsci15070667 - 20 Jun 2025

Viewed by 974

Abstract

Background/Objectives: Sleep disorders are prevalent among aging populations and are often linked to cognitive decline, chronic conditions, and reduced quality of life. Traditional diagnostic methods, such as polysomnography, are resource-intensive and limited in accessibility. Meanwhile, individuals frequently describe their sleep experiences through [...] Read more.

Background/Objectives: Sleep disorders are prevalent among aging populations and are often linked to cognitive decline, chronic conditions, and reduced quality of life. Traditional diagnostic methods, such as polysomnography, are resource-intensive and limited in accessibility. Meanwhile, individuals frequently describe their sleep experiences through unstructured narratives in clinical notes, online forums, and telehealth platforms. This study proposes a machine learning pipeline (sleepCare) that classifies sleep-related narratives into clinically meaningful categories, including stress-related, neurodegenerative, and breathing-related disorders. The proposed framework employs natural language processing (NLP) and machine learning techniques to support remote applications and real-time patient monitoring, offering a scalable solution for the early identification of sleep disturbances. Methods: The sleepCare consists of a three-tiered classification pipeline to analyze narrative sleep reports. First, a baseline model used a Multinomial Naïve Bayes classifier with n-gram features from a Bag-of-Words representation. Next, a Support Vector Machine (SVM) was trained on GloVe-based word embeddings to capture semantic context. Finally, a transformer-based model (BERT) was fine-tuned to extract contextual embeddings, using the [CLS] token as input for SVM classification. Each model was evaluated using stratified train-test splits and 10-fold cross-validation. Hyperparameter tuning via GridSearchCV optimized performance. The dataset contained 475 labeled sleep narratives, classified into five etiological categories relevant for clinical interpretation. Results: The transformer-based model utilizing BERT embeddings and an optimized Support Vector Machine classifier achieved an overall accuracy of 81% on the test set. Class-wise F1-scores ranged from 0.72 to 0.91, with the highest performance observed in classifying normal or improved sleep (F1 = 0.91). The macro average F1-score was 0.78, indicating balanced performance across all categories. GridSearchCV identified the optimal SVM parameters (C = 4, kernel = ‘rbf’, gamma = 0.01, degree = 2, class_weight = ‘balanced’). The confusion matrix revealed robust classification with limited misclassifications, particularly between overlapping symptom categories such as stress-related and neurodegenerative sleep disturbances. Conclusions: Unlike generic large language model applications, our approach emphasizes the personalized identification of sleep symptomatology through targeted classification of the narrative input. By integrating structured learning with contextual embeddings, the framework offers a clinically meaningful, scalable solution for early detection and differentiation of sleep disorders in diverse, real-world, and remote settings. Full article

(This article belongs to the Special Issue Perspectives of Artificial Intelligence (AI) in Aging Neuroscience)

► Show Figures

Graphical abstract

27 pages, 6291 KiB

Open AccessArticle

Data-Driven Fault Detection and Diagnosis in Cooling Units Using Sensor-Based Machine Learning Classification

by Amilcar Quispe-Astorga, Roger Jesus Coaquira-Castillo, L. Walter Utrilla Mego, Julio Cesar Herrera-Levano, Yesenia Concha-Ramos, Erwin J. Sacoto-Cabrera and Edison Moreno-Cardenas

Sensors 2025, 25(12), 3647; https://doi.org/10.3390/s25123647 - 11 Jun 2025

Viewed by 684

Abstract

Precision air conditioning (PAC) systems are prone to various types of failures, leading to inefficiencies, increased energy consumption, and possible reductions in equipment performance. This study proposes an automatic real-time fault detection and diagnosis system. It classifies events as either faulty or normal [...] Read more.

Precision air conditioning (PAC) systems are prone to various types of failures, leading to inefficiencies, increased energy consumption, and possible reductions in equipment performance. This study proposes an automatic real-time fault detection and diagnosis system. It classifies events as either faulty or normal by analyzing key status signals such as pressure, temperature, current, and voltage. This research is based on data-driven models and machine learning, where a specific strategy is proposed for five types of system failures. The work was carried out on a Rittal PAC, model SK3328.500 (cooling unit), installing capacitive pressure sensors, Hall effect current sensors, electromagnetic induction voltage sensors, infrared temperature sensors, and thermocouple-type sensors. For the implementation of the system, a dataset of PAC status signals was obtained, initially consisting of 31,057 samples after a preprocessing step using the Random Under-Sampler (RUS) module. A database with 20,000 samples was obtained, which includes normal and failed operating events generated in the PAC. The selection of the models is based on accuracy criteria, evaluated by testing in both offline (database) and real-time conditions. The Support Vector Machine (SVM) model achieved 93%, Decision Tree (DT) 93%, Gradient Boosting (GB) 91%, K-Nearest Neighbors (KNN) 83%, and Naive Bayes (NB) 77%, while the Random Forest (RF) model stood out, having an accuracy of 96% in deferred tests and 95.28% in real-time. Finally, a validation test was performed with the best-selected model in real time, simulating a real environment for the PAC system, achieving an accuracy rate of 93.49%. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

23 pages, 3830 KiB

Open AccessArticle

A Hybrid Artificial Intelligence Approach for Down Syndrome Risk Prediction in First Trimester Screening

by Emre Yalçın, Serpil Aslan, Mesut Toğaçar and Süleyman Cansun Demir

Diagnostics 2025, 15(12), 1444; https://doi.org/10.3390/diagnostics15121444 - 6 Jun 2025

Viewed by 831

Abstract

Background/Objectives: The aim of this study is to develop a hybrid artificial intelligence (AI) approach to improve the accuracy, efficiency, and reliability of Down Syndrome (DS) risk prediction during first trimester prenatal screening. The proposed method transforms one-dimensional (1D) patient data—including features such [...] Read more.

Background/Objectives: The aim of this study is to develop a hybrid artificial intelligence (AI) approach to improve the accuracy, efficiency, and reliability of Down Syndrome (DS) risk prediction during first trimester prenatal screening. The proposed method transforms one-dimensional (1D) patient data—including features such as nuchal translucency (NT), human chorionic gonadotropin (hCG), and pregnancy-associated plasma protein A (PAPP-A)—into two-dimensional (2D) Aztec barcode images, enabling advanced feature extraction using transformer-based deep learning models. Methods: The dataset consists of 958 anonymous patient records. Each record includes four first trimester screening markers, hCG, PAPP-A, and NT, expressed as multiples of the median. The DS risk outcome was categorized into three classes: high, medium, and low. Three transformer architectures—DeiT3, MaxViT, and Swin—are employed to extract high-level features from the generated barcodes. The extracted features are combined into a unified set, and dimensionality reduction is performed using two feature selection techniques: minimum Redundancy Maximum Relevance (mRMR) and RelieF. Intersecting features from both selectors are retained to form a compact and informative feature subset. The final features are classified using machine learning algorithms, including Bagged Trees and Naive Bayes. Results: The proposed approach achieved up to 100% classification accuracy using the Naive Bayes classifier with 1250 features selected by RelieF and 527 intersecting features from mRMR. By selecting a smaller but more informative subset of features, the system significantly reduced hardware and processing demands while maintaining strong predictive performance. Conclusions: The results suggest that the proposed hybrid AI method offers a promising and resource-efficient solution for DS risk assessment in first trimester screening. However, further comparative studies are recommended to validate its performance in broader clinical contexts. Full article

(This article belongs to the Special Issue Artificial Intelligence for Health and Medicine)

► Show Figures

Figure 1

28 pages, 13036 KiB

Open AccessArticle

Statistical Analysis of a Generalized Variant of the Weibull Model Under Unified Hybrid Censoring with Applications to Cancer Data

by Mazen Nassar, Refah Alotaibi and Ahmed Elshahhat

Axioms 2025, 14(6), 442; https://doi.org/10.3390/axioms14060442 - 5 Jun 2025

Viewed by 429

Abstract

This paper investigates an understudied generalization of the classical exponential, Rayleigh, and Weibull distributions, known as the power generalized Weibull distribution, particularly in the context of censored data. Characterized by one scale parameter and two shape parameters, the proposed model offers enhanced flexibility [...] Read more.

This paper investigates an understudied generalization of the classical exponential, Rayleigh, and Weibull distributions, known as the power generalized Weibull distribution, particularly in the context of censored data. Characterized by one scale parameter and two shape parameters, the proposed model offers enhanced flexibility for modeling diverse lifetime data patterns and hazard rate behaviors. Notably, its hazard rate function can exhibit five distinct shapes, including upside-down bathtub and bathtub shapes. The study focuses on classical and Bayesian estimation frameworks for the model parameters and associated reliability metrics under a unified hybrid censoring scheme. Methodologies include both point estimation (maximum likelihood and posterior mean estimators) and interval estimation (approximate confidence intervals and Bayesian credible intervals). To evaluate the performance of these estimators, a comprehensive simulation study is conducted under varied experimental conditions. Furthermore, two empirical applications on real-world cancer datasets underscore the efficacy of the proposed estimation methods and the practical viability and flexibility of the explored model compared to eleven other existing lifespan models. Full article

(This article belongs to the Special Issue Methods and Applications of Advanced Statistical Analysis, 2nd Edition)

► Show Figures

Figure 1

Search Results (503)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (503)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI