Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,301)

Search Parameters:
Keywords = KNN—K-nearest neighbor

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 2986 KB  
Article
Comparing Statistical and Machine-Learning Models for Seasonal Prediction of Atlantic Hurricane Activity
by Xiaoran Chen and Lian Xie
Atmosphere 2026, 17(2), 129; https://doi.org/10.3390/atmos17020129 - 26 Jan 2026
Abstract
Tropical cyclones pose major risks to life and property, especially as coastal populations grow and climate change increases the likelihood of intense storms, making seasonal prediction of tropical cyclones an important scientific and societal goal. This study uses HURDAT best-track records from 1950 [...] Read more.
Tropical cyclones pose major risks to life and property, especially as coastal populations grow and climate change increases the likelihood of intense storms, making seasonal prediction of tropical cyclones an important scientific and societal goal. This study uses HURDAT best-track records from 1950 to 2024 to quantify annual tropical cyclone, hurricane, and major hurricane counts across the Atlantic basin, Caribbean Sea, and Gulf of Mexico. These nine targets are paired with 34 monthly climate predictors from NOAA and NASA GISS—including SST and ENSO indices, Main Development Region (MDR) wind and pressure fields, and latent heat flux empirical orthogonal functions—evaluated under nine predictor-set configurations. Four forecasting approaches were developed and tested under operationally realistic conditions—Lasso regression, K-nearest neighbors (KNN), an artificial neural network (ANN), XGBoost—using a 30-year sliding-window cross-validation design and a Poisson log-likelihood skill score relative to climatology. Lasso performs reliably with concise, physically interpretable predictors, while XGBoost provides the most consistent overall skill, particularly for basin-wide total cyclone and hurricane counts. The skill of ANN is limited by small sample sizes, and KNN offers only marginal improvements. Forecast skill is the highest for basin-wide storm totals and decreases for regional major-hurricane targets due to lower event frequencies and stronger predictability limits. Full article
(This article belongs to the Special Issue Machine Learning for Atmospheric and Remote Sensing Research)
Show Figures

Figure 1

19 pages, 1261 KB  
Article
Predictive Modeling of Food Extrusion Using Hemp Residues: A Machine Learning Approach for Sustainable Ruminant Nutrition
by Aylin Socorro Saenz Santillano, Damián Reyes Jáquez, Rubén Guerrero Rivera, Efrén Delgado, Hiram Medrano Roldan and Josué Ortiz Medina
Processes 2026, 14(3), 418; https://doi.org/10.3390/pr14030418 - 25 Jan 2026
Abstract
Predictive modeling of extrusion processes through machine learning (ML) offers significant improvements over classical response surface methodology (RSM) when addressing nonlinear and multivariable systems. This study evaluated hemp residues (Cannabis sativa) as a non-conventional ingredient in ruminant diets and compared the [...] Read more.
Predictive modeling of extrusion processes through machine learning (ML) offers significant improvements over classical response surface methodology (RSM) when addressing nonlinear and multivariable systems. This study evaluated hemp residues (Cannabis sativa) as a non-conventional ingredient in ruminant diets and compared the performance of polynomial regression models against several ML algorithms, including artificial neural networks (ANNs), random forest (RF), K-Nearest neighbors (KNN), and XGBoost. Three experimental datasets from previous extrusion studies were concatenated with new laboratory experiments, creating a unified database in excel. Input variables included extrusion parameters (temperature, screw speed, and moisture) and formulation components, while output variables comprised expansion index, BD, penetration force, water absorption index and water solubility index. Data preprocessing involved robust z-score detection of outliers (MAD criterion) with intra-group winsorization, followed by normalization to a [−1, +1] range. Hyperparameter optimization of ANN models was performed with Optuna, and all algorithms were evaluated through 5-fold cross-validation and independent external validation sets. Results demonstrated that ML models consistently outperformed quadratic regression, with ANNs achieving R2 > 0.80 for BD and water solubility index, and RF excelling in predicting solubility. These findings establish machine learning as a robust predictive framework for extrusion processes and highlight hemp residues as a sustainable feed ingredient with potential to improve ruminant nutrition and reduce environmental impacts. Full article
Show Figures

Figure 1

15 pages, 2981 KB  
Article
Capacity-Limited Failure in Approximate Nearest Neighbor Search on Image Embedding Spaces
by Morgan Roy Cooper and Mike Busch
J. Imaging 2026, 12(2), 55; https://doi.org/10.3390/jimaging12020055 - 25 Jan 2026
Abstract
Similarity search on image embeddings is a common practice for image retrieval in machine learning and pattern recognition systems. Approximate nearest neighbor (ANN) methods enable scalable similarity search on large datasets, often approaching sub-linear complexity. Yet, little empirical work has examined how ANN [...] Read more.
Similarity search on image embeddings is a common practice for image retrieval in machine learning and pattern recognition systems. Approximate nearest neighbor (ANN) methods enable scalable similarity search on large datasets, often approaching sub-linear complexity. Yet, little empirical work has examined how ANN neighborhood geometry differs from that of exact k-nearest neighbors (k-NN) search as the neighborhood size increases under constrained search effort. This study quantifies how approximate neighborhood structure changes relative to exact k-NN search as k increases across three experimental conditions. Using multiple random subsets of 10,000 images drawn from the STL-10 dataset, we compute ResNet-50 image embeddings, perform an exact k-NN search, and compare it to a Hierarchical Navigable Small World (HNSW)-based ANN search under controlled hyperparameter regimes. We evaluated the fidelity of neighborhood structure using neighborhood overlap, average neighbor distance, normalized barycenter shift, and local intrinsic dimensionality (LID). Results show that exact k-NN and ANN search behave nearly identically when efSearch>k. However, as the neighborhood size grows and efSearch remains fixed, ANN search fails abruptly, exhibiting extreme divergence in neighbor distances at approximately k23.5×efSearch. Increasing index construction quality delays this failure, and scaling search effort proportionally with neighborhood size (efSearch=α×k with α1) preserves neighborhood geometry across all evaluated metrics, including LID. The findings indicate that ANN search preserves neighborhood geometry within its operational capacity but abruptly fails when this capacity is exceeded. Documenting this behavior is relevant for scientific applications that approximate embedding spaces and provides practical guidance on when ANN search is interchangeable with exact k-NN and when geometric differences become nontrivial. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

16 pages, 1974 KB  
Article
Edible Oil Adulteration Analysis via QPCA and PSO-LSSVR Based on 3D-FS
by Si-Yuan Wang, Qi-Yang Liu, Ai-Ling Tan and Linan Liu
Processes 2026, 14(2), 390; https://doi.org/10.3390/pr14020390 - 22 Jan 2026
Viewed by 63
Abstract
A method utilizing quaternion principal component analysis (QPCA) for three-dimensional fluorescence spectral (3D FS) feature extraction is employed to identify frying oil in edible oil. Particle swarm optimization partial least squares support vector machine (PSO-LSSVR) is utilized for detecting frying oil concentration. The [...] Read more.
A method utilizing quaternion principal component analysis (QPCA) for three-dimensional fluorescence spectral (3D FS) feature extraction is employed to identify frying oil in edible oil. Particle swarm optimization partial least squares support vector machine (PSO-LSSVR) is utilized for detecting frying oil concentration. The study includes rapeseed oil, soybean oil, peanut oil, blending oil, and corn oil samples. Adulteration involves adding frying oil to these edible oils at concentrations of 0%, 5%, 10%, 30%, 50%, 70%, and 100%. Firstly, the F7000 fluorescence spectrometer is employed to measure the 3D FS of the adulterated edible oil samples, resulting in the generation of contour maps and 3D FS projections. The excitation wavelengths utilized in these measurements are 360 nm, 380 nm, and 400 nm, while the emission wavelengths span from 220 nm to 900 nm. Secondly, leveraging the automatic peak-finding function of the spectrometer, a quaternion parallel representation model of the 3D FS data for frying oil in edible oil is established using the emission spectra data corresponding to the aforementioned excitation wavelengths. Subsequently, in conjunction with the K-nearest neighbor classification (KNN), three feature extraction methods—summation, modulus, and multiplication quaternion feature extraction—are compared to identify the optimal approach. Thirdly, the extracted features are input into KNN, particle swarm optimization support vector machine (PSO-SVM), and genetic algorithm support vector machine (GA-SVM) classifiers to ascertain the most effective discriminant model for adulterated edible oil. Ultimately, a quantitative model for adulterated edible oil is developed based on partial least squares regression, PSO-SVR and PSO-LSSVR. The results indicate that the classification accuracy of QPCA features combined with PSO-SVM achieved 100%. Furthermore, the PSO-LSSVR quantitative model exhibited the best performance. Full article
Show Figures

Figure 1

27 pages, 5594 KB  
Article
Conditional Tabular Generative Adversarial Network Based Clinical Data Augmentation for Enhanced Predictive Modeling in Chronic Kidney Disease Diagnosis
by Princy Randhawa, Veerendra Nath Jasthi, Kumar Piyush, Gireesh Kumar Kaushik, Malathy Batamulay, S. N. Prasad, Manish Rawat, Kiran Veernapu and Nithesh Naik
BioMedInformatics 2026, 6(1), 6; https://doi.org/10.3390/biomedinformatics6010006 - 22 Jan 2026
Viewed by 106
Abstract
The lack of clinical data for chronic kidney disease (CKD) prediction frequently results in model overfitting and inadequate generalization to novel samples. This research mitigates this constraint by utilizing a Conditional Tabular Generative Adversarial Network (CTGAN) to enhance a constrained CKD dataset sourced [...] Read more.
The lack of clinical data for chronic kidney disease (CKD) prediction frequently results in model overfitting and inadequate generalization to novel samples. This research mitigates this constraint by utilizing a Conditional Tabular Generative Adversarial Network (CTGAN) to enhance a constrained CKD dataset sourced from the University of California, Irvine (UCI) Machine Learning Repository. The CTGAN model was trained to produce realistic synthetic samples that preserve the statistical and feature distributions of the original dataset. Multiple machine learning models, such as AdaBoost, Random Forest, Gradient Boosting, and K-Nearest Neighbors (KNN), were assessed on both the original and enhanced datasets with incrementally increasing degrees of synthetic data dilution. AdaBoost attained 100% accuracy on the original dataset, signifying considerable overfitting; however, the model exhibited enhanced generalization and stability with the CTGAN-augmented data. The occurrence of 100% test accuracy in several models should not be interpreted as realistic clinical performance. Instead, it reflects the limited size, clean structure, and highly separable feature distributions of the UCI CKD dataset. Similar behavior has been reported in multiple previous studies using this dataset. Such perfect accuracy is a strong indication of overfitting and limited generalizability, rather than feature or label leakage. This observation directly motivates the need for controlled data augmentation to introduce variability and improve model robustness. The dataset with the greatest dilution, comprising 2000 synthetic cases, attained a test accuracy of 95.27% utilizing a stochastic gradient boosting approach. Ensemble learning techniques, particularly gradient boosting and random forest, regularly surpassed conventional models like KNN in terms of predicted accuracy and resilience. The results demonstrate that CTGAN-based data augmentation introduces critical variability, diminishes model bias, and serves as an effective regularization technique. This method provides a viable alternative for reducing overfitting and improving predictive modeling accuracy in data-deficient medical fields, such as chronic kidney disease diagnosis. Full article
Show Figures

Figure 1

22 pages, 7096 KB  
Article
An Improved ORB-KNN-Ratio Test Algorithm for Robust Underwater Image Stitching on Low-Cost Robotic Platforms
by Guanhua Yi, Tianxiang Zhang, Yunfei Chen and Dapeng Yu
J. Mar. Sci. Eng. 2026, 14(2), 218; https://doi.org/10.3390/jmse14020218 - 21 Jan 2026
Viewed by 67
Abstract
Underwater optical images often exhibit severe color distortion, weak texture, and uneven illumination due to light absorption and scattering in water. These issues result in unstable feature detection and inaccurate image registration. To address these challenges, this paper proposes an underwater image stitching [...] Read more.
Underwater optical images often exhibit severe color distortion, weak texture, and uneven illumination due to light absorption and scattering in water. These issues result in unstable feature detection and inaccurate image registration. To address these challenges, this paper proposes an underwater image stitching method that integrates ORB (Oriented FAST and Rotated BRIEF) feature extraction with a fixed-ratio constraint matching strategy. First, lightweight color and contrast enhancement techniques are employed to restore color balance and improve local texture visibility. Then, ORB descriptors are extracted and matched via a KNN (K-Nearest Neighbors) nearest-neighbor search, and Lowe’s ratio test is applied to eliminate false matches caused by weak texture similarity. Finally, the geometric transformation between image frames is estimated by incorporating robust optimization, ensuring stable homography computation. Experimental results on real underwater datasets show that the proposed method significantly improves stitching continuity and structural consistency, achieving 40–120% improvements in SSIM (Structural Similarity Index) and PSNR (peak signal-to-noise ratio) over conventional Harris–ORB + KNN, SIFT (scale-invariant feature transform) + BF (brute force), SIFT + KNN, and AKAZE (accelerated KAZE) + BF methods while maintaining processing times within one second. These results indicate that the proposed method is well-suited for real-time underwater environment perception and panoramic mapping on low-cost, micro-sized underwater robotic platforms. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

19 pages, 5277 KB  
Article
A Machine Learning Approach Using Spatially Explicit K-Nearest Neighbors for House Price Predictions
by Meifang Chen, Changho Lee and Yongwan Chun
ISPRS Int. J. Geo-Inf. 2026, 15(1), 46; https://doi.org/10.3390/ijgi15010046 - 21 Jan 2026
Viewed by 117
Abstract
Spatial data has distinctive properties that differentiate it from non-spatial data. One prominent characteristic is spatial autocorrelation (SA). When machine learning techniques are applied for spatial data modeling, they require spatially explicit consideration. If these inherent spatial structures are ignored, models may produce [...] Read more.
Spatial data has distinctive properties that differentiate it from non-spatial data. One prominent characteristic is spatial autocorrelation (SA). When machine learning techniques are applied for spatial data modeling, they require spatially explicit consideration. If these inherent spatial structures are ignored, models may produce biased predictions. However, integrating this property into the model yields additional spatial insight, thereby enhancing learning and improving predictive accuracy. This study examines spatially explicit K-nearest neighbors (SE-KNN) by integrating SA as a spatially explicit property, λ, into the learning algorithm. The innovation of SE-KNN lies in its alignment with the principle of spatial autocorrelation, as KNN’s core learning assumption—that similar observations tend to have similar outcomes—naturally parallels spatial dependence. The proposed SE-KNN is applied to a house price prediction model using house sales data from Franklin County, Ohio to demonstrate a spatially dependent, data-rich, and real-world problem. The results show that SE-KNN achieved the best prediction accuracy compared to mean of absolute error (MAE) of three other machine learning approaches (i.e., standard KNN, linear regression, and artificial neural networks). The proposed method effectively captures the spatial structures in the housing market and leaves only a trace amount of SA in the residuals. Full article
(This article belongs to the Special Issue Spatial Data Science and Knowledge Discovery)
Show Figures

Figure 1

13 pages, 6367 KB  
Article
Gene Expression-Based Colorectal Cancer Prediction Using Machine Learning and SHAP Analysis
by Yulai Yin, Zhen Yang, Xueqing Li, Shuo Gong and Chen Xu
Genes 2026, 17(1), 114; https://doi.org/10.3390/genes17010114 - 20 Jan 2026
Viewed by 216
Abstract
Objective: To develop and validate a genetic diagnostic model for colorectal cancer (CRC). Methods: First, differential expression genes (DEGs) between colorectal cancer and normal groups were screened using the TCGA database. Subsequently, a two-sample Mendelian randomization analysis was performed using the eQTL genomic [...] Read more.
Objective: To develop and validate a genetic diagnostic model for colorectal cancer (CRC). Methods: First, differential expression genes (DEGs) between colorectal cancer and normal groups were screened using the TCGA database. Subsequently, a two-sample Mendelian randomization analysis was performed using the eQTL genomic data from the IEU OpenGWAS database and colorectal cancer outcomes from the R12 Finnish database to identify associated genes. The intersecting genes from both methods were selected for the development and validation of the CRC genetic diagnostic model using nine machine learning algorithms: Lasso Regression, XGBoost, Gradient Boosting Machine (GBM), Generalized Linear Model (GLM), Neural Network (NN), Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Random Forest (RF), and Decision Tree (DT). Results: A total of 3716 DEGs were identified from the TCGA database, while 121 genes were associated with CRC based on the eQTL Mendelian randomization analysis. The intersection of these two methods yielded 27 genes. Among the nine machine learning methods, XGBoost achieved the highest AUC value of 0.990. The top five genes predicted by the XGBoost method—RIF1, GDPD5, DBNDD1, RCCD1, and CLDN5—along with the five most significantly differentially expressed genes (ASCL2, IFITM3, IFITM1, SMPDL3A, and SUCLG2) in the GSE87211 dataset, were selected for the construction of the final colorectal cancer (CRC) genetic diagnostic model. The ROC curve analysis revealed an AUC (95% CI) of 0.9875 (0.9737–0.9875) for the training set, and 0.9601 (0.9145–0.9601) for the validation set, indicating strong predictive performance of the model. SHAP model interpretation further identified IFITM1 and DBNDD1 as the most influential genes in the XGBoost model, with both making positive contributions to the model’s predictions. Conclusions: The gene expression profile in colorectal cancer is characterized by enhanced cell proliferation, elevated metabolic activity, and immune evasion. A genetic diagnostic model constructed based on ten genes (RIF1, GDPD5, DBNDD1, RCCD1, CLDN5, ASCL2, IFITM3, IFITM1, SMPDL3A, and SUCLG2) demonstrates strong predictive performance. This model holds significant potential for the early diagnosis and intervention of colorectal cancer, contributing to the implementation of third-tier prevention strategies. Full article
(This article belongs to the Section Bioinformatics)
Show Figures

Figure 1

28 pages, 5076 KB  
Article
Comparative Evaluation of EMG Signal Classification Techniques Across Temporal, Frequency, and Time-Frequency Domains Using Machine Learning
by Jose Manuel Lopez-Villagomez, Juan Manuel Lopez-Hernandez, Ruth Ivonne Mata-Chavez, Carlos Rodriguez-Donate, Yeraldyn Guzman-Castro and Eduardo Cabal-Yepez
Appl. Sci. 2026, 16(2), 1058; https://doi.org/10.3390/app16021058 - 20 Jan 2026
Viewed by 127
Abstract
This study focuses on classifying electromyographic (EMG) signals to identify seven specific hand movements, including complete hand closure, individual finger closures, and a pincer grip. Accurately distinguishing these movements is challenging due to overlapping muscle activation patterns. To address this, a methodology structured [...] Read more.
This study focuses on classifying electromyographic (EMG) signals to identify seven specific hand movements, including complete hand closure, individual finger closures, and a pincer grip. Accurately distinguishing these movements is challenging due to overlapping muscle activation patterns. To address this, a methodology structured in five stages was developed: placement of electrodes on specific forearm muscles to capture electrical activity during movements; acquisition of EMG signals from twelve participants performing the seven types of movements; preprocessing of the signals through filtering and rectification to enhance quality, followed by the extraction of features from three distinct types of preprocessed signals—filtered, rectified, and envelope signals—to facilitate analysis in the temporal, frequency, and time–frequency domains; extraction of relevant features such as amplitude, shape, symmetry, and frequency variance; and classification of the signals using eight machine learning algorithms: support vector machine (SVM), multiclass logistic regression, k-nearest neighbors (k-NN), Bayesian classifier, artificial neural network (ANN), random forest, XGBoost, and LightGBM. The performance of each algorithm was evaluated using different sets of features derived from the preprocessed signals to identify the most effective approach for classifying hand movements. Additionally, the impact of various signal representations on classification accuracy was examined. Experimental results indicated that some algorithms, especially when an expanded set of features was utilized, achieved improved accuracy in classifying hand movements. These findings contribute to the development of more efficient control systems for myoelectric prostheses and offer insights for future research in EMG signal processing and pattern recognition. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

33 pages, 550 KB  
Article
Intelligent Information Processing for Corporate Performance Prediction: A Hybrid Natural Language Processing (NLP) and Deep Learning Approach
by Qidi Yu, Chen Xing, Yanjing He, Sunghee Ahn and Hyung Jong Na
Electronics 2026, 15(2), 443; https://doi.org/10.3390/electronics15020443 - 20 Jan 2026
Viewed by 137
Abstract
This study proposes a hybrid machine learning framework that integrates structured financial indicators and unstructured textual strategy disclosures to improve firm-level management performance prediction. Using corporate business reports from South Korean listed firms, strategic text was extracted and categorized under the Balanced Scorecard [...] Read more.
This study proposes a hybrid machine learning framework that integrates structured financial indicators and unstructured textual strategy disclosures to improve firm-level management performance prediction. Using corporate business reports from South Korean listed firms, strategic text was extracted and categorized under the Balanced Scorecard (BSC) framework into financial, customer, internal process, and learning and growth dimensions. Various machine learning and deep learning models—including k-nearest neighbors (KNNs), support vector machine (SVM), light gradient boosting machine (LightGBM), convolutional neural network (CNN), long short-term memory (LSTM), autoencoder, and transformer—were evaluated, with results showing that the inclusion of strategic textual data significantly enhanced prediction accuracy, precision, recall, area under the curve (AUC), and F1-score. Among individual models, the transformer architecture demonstrated superior performance in extracting context-rich semantic features. A soft-voting ensemble model combining autoencoder, LSTM, and transformer achieved the best overall performance, leading in accuracy and AUC, while the best single deep learning model (transformer) obtained a marginally higher F1 score, confirming the value of hybrid learning. Furthermore, analysis revealed that customer-oriented strategy disclosures were the most predictive among BSC dimensions. These findings highlight the value of integrating financial and narrative data using advanced NLP and artificial intelligence (AI) techniques to develop interpretable and robust corporate performance forecasting models. In addition, we operationalize information security narratives using a reproducible cybersecurity lexicon and derive security disclosure intensity and weight share features that are jointly evaluated with BSC-based strategic vectors. Full article
(This article belongs to the Special Issue Advances in Intelligent Information Processing)
Show Figures

Figure 1

16 pages, 6135 KB  
Article
Interlayer Identification Method Based on SMOTE and Ensemble Learning
by Shengqiang Luo, Bing Yu, Tianrui Zhang, Junqing Rong, Qing Zeng, Tingting Feng and Jianpeng Zhao
Processes 2026, 14(2), 351; https://doi.org/10.3390/pr14020351 - 19 Jan 2026
Viewed by 131
Abstract
The interlayer is a key geological factor that regulates reservoir heterogeneity and remaining oil distribution, and its accurate identification directly affects the reservoir development effect. To address the strong subjectivity of traditional identification methods and the insufficient recognition accuracy of single machine learning [...] Read more.
The interlayer is a key geological factor that regulates reservoir heterogeneity and remaining oil distribution, and its accurate identification directly affects the reservoir development effect. To address the strong subjectivity of traditional identification methods and the insufficient recognition accuracy of single machine learning models under imbalanced sample distributions, this study focuses on three types of interlayers (argillaceous, calcareous, and petrophysical interlayers) in the W Oilfield, and proposes an accurate identification method integrating the Synthetic Minority Over-Sampling Technique (SMOTE) and heterogeneous ensemble learning. Firstly, the corresponding data set of interlayer type and logging response is established. After eliminating the influence of dimension using normalization, the sensitive logging curves are optimized using the crossplot method, mutual information, and effect analysis. SMOTE technology is used to balance the sample distribution and solve the problem of the identification deviation of minority interlayers. Then, a heterogeneous ensemble model composed of the k-nearest neighbor algorithm (KNN), decision tree (DT), and support vector machine (SVM) is constructed, and the final recognition result is output using a voting strategy. The experiments show that SMOTE technology improves the average accuracy of a single model by 3.9% and effectively improves the model bias caused by sample imbalance. The heterogeneous integration model improves the overall recognition accuracy to 92.6%, significantly enhances the ability to distinguish argillaceous and petrophysical interlayers, and optimizes the F1-Score simultaneously. This method features a high accuracy and reliable performance, providing robust support for interlayer identification in reservoir geological modeling and remaining oil potential tapping, and demonstrating prominent practical application value. Full article
Show Figures

Figure 1

23 pages, 947 KB  
Article
Machine Learning-Based Prediction of Coronary Artery Disease Using Clinical and Behavioral Data: A Comparative Study
by Abdulkadir Çakmak, Gülşah Akyilmaz, Aybike Gizem Köse, Gökhan Keskin and Levent Uğur
Diagnostics 2026, 16(2), 318; https://doi.org/10.3390/diagnostics16020318 - 19 Jan 2026
Viewed by 200
Abstract
Background and Objectives: Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide. An early and accurate diagnosis is essential for effective clinical management and risk stratification. Recent advances in machine learning (ML) have provided opportunities to enhance the diagnostic [...] Read more.
Background and Objectives: Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide. An early and accurate diagnosis is essential for effective clinical management and risk stratification. Recent advances in machine learning (ML) have provided opportunities to enhance the diagnostic performance by integrating multidimensional patient data. This study aimed to develop and compare several supervised ML algorithms for early CAD diagnosis using demographic, anthropometric, biochemical, and psychosocial parameters. Materials and Methods: A total of 300 adult patients (165 CAD-positive and 135 controls) were retrospectively analyzed using a dataset comprising 21 biochemical markers, body composition metrics, and self-reported eating behavior scores. Six ML algorithms, k-nearest neighbors (k-NNs), support vector machines (SVMs), artificial neural networks (ANNs), logistic regression (LR), naïve Bayes (NB), and decision trees (DTs), were trained and evaluated using 10-fold cross-validation. Model performance was assessed based on accuracy, sensitivity, false-negative rate, and area under the Receiver Operating Characteristic (ROC) curve (AUC). Results: The k-NN model achieved the highest performance, with 98.33% accuracy and an AUC of 0.99, followed by SVM (96.67%, AUC = 0.95) and ANN (95.33%, AUC = 0.98). Patients with CAD exhibited significantly higher levels of glucose, triglycerides (TGs), LDL cholesterol (LDL-C), and abdominal obesity, while vitamin B12 levels were lower (p < 0.001). Although emotional and mindful eating scores differed significantly between the groups, their contribution to model performance was limited. Conclusions: Machine learning models, particularly k-NN, SVM, and ANN, have demonstrated high accuracy in distinguishing CAD patients from healthy controls when applied to a diverse set of clinical and behavioral variables. This study highlights the potential of integrating psychosocial and clinical data to enhance CAD prediction models beyond traditional biomarkers. Full article
Show Figures

Figure 1

23 pages, 5052 KB  
Article
Exploratory Study on Hybrid Systems Performance: A First Approach to Hybrid ML Models in Breast Cancer Classification
by Francisco J. Rojas-Pérez, José R. Conde-Sánchez, Alejandra Morlett-Paredes, Fernando Moreno-Barbosa, Julio C. Ramos-Fernández, José Luna-Muñoz, Genaro Vargas-Hernández, Blanca E. Jaramillo-Loranca, Juan M. Xicotencatl-Pérez and Eucario G. Pérez-Pérez
AI 2026, 7(1), 29; https://doi.org/10.3390/ai7010029 - 15 Jan 2026
Viewed by 220
Abstract
The classification of breast cancer using machine learning techniques has become a critical tool in modern medical diagnostics. This study analyzes the performance of hybrid models that combine traditional machine learning algorithms (TMLAs) with a convolutional neural network (CNN)-based VGG16 model for feature [...] Read more.
The classification of breast cancer using machine learning techniques has become a critical tool in modern medical diagnostics. This study analyzes the performance of hybrid models that combine traditional machine learning algorithms (TMLAs) with a convolutional neural network (CNN)-based VGG16 model for feature extraction to improve accuracy for classifying eight breast cancer subtypes (BCS). The methodology consists of three steps. First, image preprocessing is performed on the BreakHis dataset at 400× magnification, which contains 1820 histopathological images classified into eight BCS. Second, the CNN VGG16 is modified to function as a feature extractor that converts images into representative vectors. These vectors constitute the training set for TMLAs, such as Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Naive Bayes (NB), leveraging VGG16’s ability to capture relevant features. Third, k-fold cross-validation is applied to evaluate the model’s performance by averaging the metrics obtained across all folds. The results reveal that hybrid models leveraging a CNN-based VGG16 model for feature extraction, followed by TMLAs, achieve accuracy outstanding experimental accuracy. The KNN-based hybrid model stood out with a precision of 0.97, accuracy of 0.96, sensitivity of 0.96, specificity of 0.99, F1-score of 0.96, and ROC-AUC of 0.97. These findings suggest that, with an appropriate methodology, hybrid models based on TMLA have strong potential in classification tasks, offering a balance between performance and predictive capability. Full article
Show Figures

Figure 1

20 pages, 3268 KB  
Article
Portable Electronic Olfactometer for Non-Invasive Screening of Canine Ehrlichiosis: A Proof-of-Concept Study Using Machine Learning
by Silvana Valentina Durán Cotrina, Cristhian Manuel Durán Acevedo and Jeniffer Katerine Carrillo Gómez
Vet. Sci. 2026, 13(1), 88; https://doi.org/10.3390/vetsci13010088 - 15 Jan 2026
Viewed by 228
Abstract
Canine ehrlichiosis, caused by Ehrlichia canis, represents a relevant challenge in veterinary medicine, particularly in resource-limited settings where access to laboratory-based diagnostics may be constrained. This pilot and exploratory study aimed to evaluate the feasibility of a portable electronic olfactometer as a [...] Read more.
Canine ehrlichiosis, caused by Ehrlichia canis, represents a relevant challenge in veterinary medicine, particularly in resource-limited settings where access to laboratory-based diagnostics may be constrained. This pilot and exploratory study aimed to evaluate the feasibility of a portable electronic olfactometer as a non-invasive screening approach, based on the analysis of volatile organic compounds (VOCs) present in breath, saliva, and hair samples from dogs. Signals were acquired using an array of eight metal-oxide (MOX) gas sensors (MQ and TGS series). After preprocessing, principal component analysis (PCA) was applied for dimensionality reduction, and the resulting features were analyzed using supervised machine-learning classifiers, including AdaBoost, support vector machines (SVM), k-nearest neighbors (k-NN), and Random Forests (RF). A total of 38 dogs (19 PCR-confirmed infected cases and 19 controls) were analyzed, generating 114 samples evenly distributed across the three biological matrices. Among the evaluated models, SVM showed the most consistent performance, particularly for saliva samples, achieving an accuracy, sensitivity, and precision of 94.7% (AUC = 0.964). In contrast, breath and hair samples showed lower discriminative performance. Given the limited sample size and the exploratory nature of the study, these results should be interpreted as preliminary; nevertheless, they suggest that electronic olfactometry may represent a complementary, low-cost, non-invasive screening tool for future research on canine ehrlichiosis, rather than a standalone diagnostic method. Full article
Show Figures

Graphical abstract

34 pages, 4013 KB  
Article
Machine Learning-Based Cyber Fraud Detection: A Comparative Study of Resampling Methods for Imbalanced Credit Card Data
by Eyad Btoush, Thaeer Kobbaey, Hatem Tamimi and Xujuan Zhou
Appl. Sci. 2026, 16(2), 850; https://doi.org/10.3390/app16020850 - 14 Jan 2026
Viewed by 165
Abstract
The prevalence of online transactions and extensive adoption of credit card payments have contributed to the escalation of credit card cyber fraud in modern society. These trends are propelled by technological advancements, which provide fraudulent actors with more opportunities. Fraudsters exploit victims’ financial [...] Read more.
The prevalence of online transactions and extensive adoption of credit card payments have contributed to the escalation of credit card cyber fraud in modern society. These trends are propelled by technological advancements, which provide fraudulent actors with more opportunities. Fraudsters exploit victims’ financial vulnerabilities by obtaining illegal access to sensitive credit card information through deceptive means, such as phishing, fraudulent phone calls, and fraudulent SMS messages. This study predicts and detects potential instances of cyber fraud in credit card transactions by employing Machine Learning (ML) techniques, including Decision Tree (DT); Random Forest (RF); Logistic Regression (LR); Support Vector Machine (SVM); K-Nearest Neighbors (KNN); XGBoost; CatBoost; and sampling techniques such as Tomek Link, Synthetic Minority oversampling technique (SMOTE), Edited Nearest Neighbor (ENN), Tomek+ENN, and SMOTE+ENN. To determine the performance of the algorithms in terms of accuracy, precision, recall, F1 score, and ROC-AUC for credit card cyber fraud detection, we conducted a comparative analysis of the extant ML techniques. Full article
Show Figures

Figure 1

Back to TopTop