Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (33)

Search Parameters:
Keywords = random k-label sets

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 2013 KB  
Article
Machine Learning Models for Reliable Gait Phase Detection Using Lower-Limb Wearable Sensor Data
by Muhammad Fiaz, Rosita Guido and Domenico Conforti
Appl. Sci. 2026, 16(3), 1397; https://doi.org/10.3390/app16031397 - 29 Jan 2026
Viewed by 97
Abstract
Accurate gait-phase detection is essential for rehabilitation monitoring, prosthetic control, and human–robot interaction. Artificial intelligence supports continuous, personalized mobility assessment by extracting clinically meaningful patterns from wearable sensors. A richer view of gait dynamics can be achieved by integrating additional signals, including inertial, [...] Read more.
Accurate gait-phase detection is essential for rehabilitation monitoring, prosthetic control, and human–robot interaction. Artificial intelligence supports continuous, personalized mobility assessment by extracting clinically meaningful patterns from wearable sensors. A richer view of gait dynamics can be achieved by integrating additional signals, including inertial, plantar flex, footswitch, and EMG data, leading to more accurate and informative gait analysis. Motivated by these needs, this study investigates discrete gait-phase recognition for the right leg using a multi-subject IMU dataset collected from lower-limb sensors. IMU recordings were segmented into 128-sample windows across 23 channels, and each window was flattened into a 2944-dimensional feature vector. To ensure reliable ground-truth labels, we developed an automatic relabeling pipeline incorporating heel-strike and toe-off detection, adaptive threshold tuning, and sensor fusion across sensor modalities. These windowed vectors were then used to train a comprehensive suite of machine learning models, including Random Forests, Extra Trees, k-Nearest Neighbors, XGBoost, and LightGBM. All models underwent systematic hyperparameter tuning, and their performance was assessed through k-fold cross-validation. The results demonstrate that tree-based ensemble models provide accurate and stable gait-phase classification with accuracy exceeding 97% across both test sets, underscoring their potential for future real-time gait analysis and lower-limb assistive technologies. Full article
Show Figures

Figure 1

37 pages, 8656 KB  
Article
Anomaly-Aware Graph-Based Semi-Supervised Deep Support Vector Data Description for Anomaly Detection
by Taha J. Alhindi
Mathematics 2025, 13(24), 3987; https://doi.org/10.3390/math13243987 - 14 Dec 2025
Viewed by 532
Abstract
Anomaly detection in safety-critical systems often operates under severe label constraints, where only a small subset of normal and anomalous samples can be reliably annotated, while large unlabeled data streams are contaminated and high-dimensional. Deep one-class methods, such as deep support vector data [...] Read more.
Anomaly detection in safety-critical systems often operates under severe label constraints, where only a small subset of normal and anomalous samples can be reliably annotated, while large unlabeled data streams are contaminated and high-dimensional. Deep one-class methods, such as deep support vector data description (DeepSVDD) and deep semi-supervised anomaly detection (DeepSAD), address this setting. However, they treat samples largely in isolation and do not explicitly leverage the manifold structure of unlabeled data, which can limit robustness and interpretability. This paper proposes Anomaly-Aware Graph-based Semi-Supervised Deep Support Vector Data Description (AAG-DSVDD), a boundary-focused deep one-class approach that couples a DeepSAD-style hypersphere with a label-aware latent k-nearest neighbor (k-NN) graph. The method combines a soft-boundary enclosure for labeled normals, a margin-based push-out for labeled anomalies, an unlabeled center-pull, and a k-NN graph regularizer on the squared distances to the center. The resulting graph term propagates information from scarce labels along the latent manifold, aligns anomaly scores of neighboring samples, and supports sample-level interpretability through graph neighborhoods, while test-time scoring remains a single distance-to-center computation. On a controlled two-dimensional synthetic dataset, AAG-DSVDD achieves a mean F1-score of 0.88±0.02 across ten random splits, improving on the strongest baseline by about 0.12 absolute F1. On three public benchmark datasets (Thyroid, Arrhythmia, and Heart), AAG-DSVDD attains the highest F1 on all datasets with F1-scores of 0.719, 0.675, and 0.8, respectively, compared to all baselines. In a multi-sensor fire monitoring case study, AAG-DSVDD reduces the average absolute error in fire starting time to approximately 473 s (about 30% improvement over DeepSAD) while keeping the average pre-fire false-alarm rate below 1% and avoiding persistent pre-fire alarms. These results indicate that graph-regularized deep one-class boundaries offer an effective and interpretable framework for semi-supervised anomaly detection under realistic label budgets. Full article
Show Figures

Figure 1

20 pages, 5100 KB  
Article
A Supervised Learning Approach for Accurate and Efficient Identification of Chikungunya Virus Lineages and Signature Mutations
by Miao Miao, Yameng Fan, Jiao Tan, Xiaobin Hu, Yonghong Ma, Guangdi Li and Ke Men
Biology 2025, 14(12), 1736; https://doi.org/10.3390/biology14121736 - 4 Dec 2025
Viewed by 545
Abstract
Chikungunya virus (CHIKV) poses a significant public health threat, and its continuous evolution necessitates high-resolution genomic surveillance. Current methods lack the speed and resolution to efficiently discriminate sub-lineages. To address this, we developed CHIKVGenotyper, an interpretable machine learning framework for high-resolution CHIKV lineage [...] Read more.
Chikungunya virus (CHIKV) poses a significant public health threat, and its continuous evolution necessitates high-resolution genomic surveillance. Current methods lack the speed and resolution to efficiently discriminate sub-lineages. To address this, we developed CHIKVGenotyper, an interpretable machine learning framework for high-resolution CHIKV lineage classification. This study leveraged a comprehensive dataset of 6886 CHIKV genome sequences, from which a high-quality set of 3014 sequences was established for model development. A hierarchical assignment pipeline that integrated a probability-based sequence matching model, machine learning refinement, and phylogenetic validation was developed to assign high-confidence labels across eight CHIKV lineages, thereby constructing a reliable dataset for subsequent analysis. Multiple machine learning models were trained and evaluated, with the optimal Random Forest model achieving near-perfect accuracy (F1-score: 99.53%) on high-coverage whole-genome test data and maintaining robust performance (F1-score: 96.50%) on an independent low-coverage set. The E2 glycoprotein alone yielded comparable accuracy (F1-score: 99.52%), highlighting its discriminative power. SHapley Additive exPlanations (SHAP) analysis identified key lineage-defining amino acid mutations, such as E1-K211E and E2-V264A, for the Indian Ocean Lineage, which were corroborated by established biological knowledge. This work provides an accurate, scalable, and interpretable tool for CHIKV molecular epidemiology, offering insights into viral evolution and aiding outbreak response. Full article
(This article belongs to the Section Bioinformatics)
Show Figures

Figure 1

24 pages, 5563 KB  
Article
Using K-Means-Derived Pseudo-Labels and Machine Learning Classification on Sentinel-2 Imagery to Delineate Snow Cover Ratio and Snowline Altitude: A Case Study on White Glacier from 2019 to 2024
by Wai Yin (Wilson) Cheung and Laura Thomson
Remote Sens. 2025, 17(23), 3872; https://doi.org/10.3390/rs17233872 - 29 Nov 2025
Viewed by 490
Abstract
Accurate equilibrium-line altitude (ELA) estimates are a valuable proxy for evaluating glacier mass balance conditions and interpreting climate-driven change in the Canadian high Arctic, where sustained in situ observations are limited. A scalable remote-sensing framework is evaluated to extract the snow cover ratio [...] Read more.
Accurate equilibrium-line altitude (ELA) estimates are a valuable proxy for evaluating glacier mass balance conditions and interpreting climate-driven change in the Canadian high Arctic, where sustained in situ observations are limited. A scalable remote-sensing framework is evaluated to extract the snow cover ratio (SCR) and snowline altitude (SLA) on White Glacier (Axel Heiberg Island, Nunavut) and to assess the agreement with in situ ELA measurements. Ten-metre Sentinel-2 imagery (2019–2024) is processed with a hybrid pipeline comprising the principal component analysis (PCA) of four bands (B2, B3, B4, and B8), unsupervised K-means for pseudo-label generation, and a Random Forest (RF) classifier for snow/ice/ground mapping. SLA is defined based on the date of seasonal minimum SCR using (i) a snowline pixel elevation histogram (SPEH; mode) and (ii) elevation binning with SCR thresholds (0.5 and 0.8). Validation against field-derived ELAs (2019–2023) is performed; formal SLA precision from DEM and binning is quantified (±4.7 m), and associations with positive degree days (PDDs) at Eureka are examined. The RF classifier reproduces the spectral clustering structure with >99.9% fidelity. Elevation binning at SCR0.8 yields SLAs closely matching field ELAs (Pearson r=0.994, p=0.0006; RMSE =30 m), whereas SPEH and lower-threshold binning are less accurate. Interannual variability is pronounced as follows: minimum SCR spans 0.46–0.76 and co-varies with SLA; correlations with PDDs are positive but modest. Results indicate that high-threshold elevation-bin filtering with machine learning provides a reliable proxy for ELA in clean-ice settings, with potential transferability to other data-sparse Arctic sites, while underscoring the importance of image timing and mixed-pixel effects in residual SLA–ELA differences. Full article
(This article belongs to the Special Issue AI-Driven Mapping Using Remote Sensing Data)
Show Figures

Figure 1

29 pages, 2154 KB  
Article
A Lightweight Training Approach for MITM Detection in IoT Networks: Time-Window Selection and Generalization
by Yi-Min Yang, Ko-Chin Chang and Jia-Ning Luo
Appl. Sci. 2025, 15(22), 12147; https://doi.org/10.3390/app152212147 - 16 Nov 2025
Viewed by 485
Abstract
The world has adopted so many IoT devices but it comes with its own share of security vulnerabilities. One such issue is ARP spoofing attack which allows a man-in-the-middle to intercept packets and thereby modify the communication. Also, this allows an intruder to [...] Read more.
The world has adopted so many IoT devices but it comes with its own share of security vulnerabilities. One such issue is ARP spoofing attack which allows a man-in-the-middle to intercept packets and thereby modify the communication. Also, this allows an intruder to gain access to the user’s entire local area network. The ACI-IoT-2023 dataset captures ARP spoofing attacks, yet its absence of specified extracted features hinders its application in machine learning-aided intrusion detection systems. To combat this, we present a framework for ARP spoofing detection which improves the dataset by extracting ARP-specific features and evaluating their impact under different time-window configurations. Beyond generic feature engineering and model evaluation, we contribute by treating ARP spoofing as a time-window pattern and aligning the window length with observed spoofing persistence from the dataset timesheet—turning window choice into an explainable, repeatable setting for constrained IoT devices; by standardizing deployment-oriented efficiency profiling (inference latency, RAM usage, and model size) reported alongside accuracy, precision, recall and F1-scores to enable edge-feasible model selection; and by providing an ARP-focused, reproducible pipeline that reconstructs L2 labels from public PCAPs and derives missing link-layer indicators, yielding a transparent path from labeling to windowed features to training evaluation. Our research systematically analyzes five models with multiple time-windows, including Decision Tree, Random Forest, XGBoost, CatBoost, and K-Nearest Neighbors. This study shows that XGBoost and CatBoost provide maximum performance at the 1800 s window that corresponds to the longest spoofing duration in the timesheet, achieving accuracy greater than 0.93%, precision above 0.95%, recall near 0.91%, and F1-scores above 0.93%. Although Decision Tree has the least inference latency (∼0.4 ms.), its lower recall risks missed attacks. By contrast, XGBoost and CatBoost sustain strong detection with less than 6$ ms inference and moderate RAM, indicating practicality for IoT deployment. We also observe diminishing returns beyond (∼1800 s) due to temporal over-aggregation. Full article
(This article belongs to the Special Issue Machine Learning and Its Application for Anomaly Detection)
Show Figures

Figure 1

20 pages, 2314 KB  
Article
Explainable AI-Driven Raman Spectroscopy for Rapid Bacterial Identification
by Dimitris Kalatzis, Angeliki I. Katsafadou, Dimitrios Chatzopoulos, Charalambos Billinis and Yiannis Kiouvrekis
Micro 2025, 5(4), 46; https://doi.org/10.3390/micro5040046 - 14 Oct 2025
Cited by 1 | Viewed by 1672
Abstract
Raman spectroscopy is a rapid, label-free, and non-destructive technique for probing molecular structures, making it a powerful tool for clinical pathogen identification. However, interpreting its complex spectral data remains challenging. In this study, we evaluate and compare a suite of machine learning models—including [...] Read more.
Raman spectroscopy is a rapid, label-free, and non-destructive technique for probing molecular structures, making it a powerful tool for clinical pathogen identification. However, interpreting its complex spectral data remains challenging. In this study, we evaluate and compare a suite of machine learning models—including Support Vector Machines (SVM), XGBoost, LightGBM, Random Forests, k-nearest Neighbors (k-NN), Convolutional Neural Networks (CNNs), and fully connected Neural Networks—with and without Principal Component Analysis (PCA) for dimensionality reduction. Using Raman spectral data from 30 clinically important bacterial and fungal species that collectively account for over 90% of human infections in hospital settings, we conducted rigorous hyperparameter tuning and assessed model performance based on accuracy, precision, recall, and F1-score. The SVM with an RBF kernel combined with PCA emerged as the top-performing model, achieving the highest accuracy (0.9454) and F1-score (0.9454). Ensemble methods such as LightGBM and XGBoost also demonstrated strong performance, while CNNs provided competitive results among deep learning approaches. Importantly, interpretability was achieved via SHAP (Shapley Additive exPlanations), which identified class-specific Raman wavenumber regions critical to prediction. These interpretable insights, combined with strong classification performance, underscore the potential of explainable AI-driven Raman analysis to accelerate clinical microbiology diagnostics, optimize antimicrobial therapy, and improve patient outcomes. Full article
Show Figures

Figure 1

19 pages, 1339 KB  
Article
Convolutional Graph Network-Based Feature Extraction to Detect Phishing Attacks
by Saif Safaa Shakir, Leyli Mohammad Khanli and Hojjat Emami
Future Internet 2025, 17(8), 331; https://doi.org/10.3390/fi17080331 - 25 Jul 2025
Viewed by 1762
Abstract
Phishing attacks pose significant risks to security, drawing considerable attention from both security professionals and customers. Despite extensive research, the current phishing website detection mechanisms often fail to efficiently diagnose unknown attacks due to their poor performances in the feature selection stage. Many [...] Read more.
Phishing attacks pose significant risks to security, drawing considerable attention from both security professionals and customers. Despite extensive research, the current phishing website detection mechanisms often fail to efficiently diagnose unknown attacks due to their poor performances in the feature selection stage. Many techniques suffer from overfitting when working with huge datasets. To address this issue, we propose a feature selection strategy based on a convolutional graph network, which utilizes a dataset containing both labels and features, along with hyperparameters for a Support Vector Machine (SVM) and a graph neural network (GNN). Our technique consists of three main stages: (1) preprocessing the data by dividing them into testing and training sets, (2) constructing a graph from pairwise feature distances using the Manhattan distance and adding self-loops to nodes, and (3) implementing a GraphSAGE model with node embeddings and training the GNN by updating the node embeddings through message passing from neighbors, calculating the hinge loss, applying the softmax function, and updating weights via backpropagation. Additionally, we compute the neighborhood random walk (NRW) distance using a random walk with restart to create an adjacency matrix that captures the node relationships. The node features are ranked based on gradient significance to select the top k features, and the SVM is trained using the selected features, with the hyperparameters tuned through cross-validation. We evaluated our model on a test set, calculating the performance metrics and validating the effectiveness of the PhishGNN dataset. Our model achieved a precision of 90.78%, an F1-score of 93.79%, a recall of 97%, and an accuracy of 93.53%, outperforming the existing techniques. Full article
(This article belongs to the Section Cybersecurity)
Show Figures

Graphical abstract

22 pages, 2775 KB  
Article
Surface Broadband Radiation Data from a Bipolar Perspective: Assessing Climate Change Through Machine Learning
by Alice Cavaliere, Claudia Frangipani, Daniele Baracchi, Maurizio Busetto, Angelo Lupi, Mauro Mazzola, Simone Pulimeno, Vito Vitale and Dasara Shullani
Climate 2025, 13(7), 147; https://doi.org/10.3390/cli13070147 - 13 Jul 2025
Cited by 1 | Viewed by 1085
Abstract
Clouds modulate the net radiative flux that interacts with both shortwave (SW) and longwave (LW) radiation, but the uncertainties regarding their effect in polar regions are especially high because ground observations are lacking and evaluation through satellites is made difficult by high surface [...] Read more.
Clouds modulate the net radiative flux that interacts with both shortwave (SW) and longwave (LW) radiation, but the uncertainties regarding their effect in polar regions are especially high because ground observations are lacking and evaluation through satellites is made difficult by high surface reflectance. In this work, sky conditions for six different polar stations, two in the Arctic (Ny-Ålesund and Utqiagvik [formerly Barrow]) and four in Antarctica (Neumayer, Syowa, South Pole, and Dome C) will be presented, considering the decade between 2010 and 2020. Measurements of broadband SW and LW radiation components (both downwelling and upwelling) are collected within the frame of the Baseline Surface Radiation Network (BSRN). Sky conditions—categorized as clear sky, cloudy, or overcast—were determined using cloud fraction estimates obtained through the RADFLUX method, which integrates shortwave (SW) and longwave (LW) radiative fluxes. RADFLUX was applied with daily fitting for all BSRN stations, producing two cloud fraction values: one derived from shortwave downward (SWD) measurements and the other from longwave downward (LWD) measurements. The variation in cloud fraction used to classify conditions from clear sky to overcast appeared consistent and reasonable when compared to seasonal changes in shortwave downward (SWD) and diffuse radiation (DIF), as well as longwave downward (LWD) and longwave upward (LWU) fluxes. These classifications served as labels for a machine learning-based classification task. Three algorithms were evaluated: Random Forest, K-Nearest Neighbors (KNN), and XGBoost. Input features include downward LW radiation, solar zenith angle, surface air temperature (Ta), relative humidity, and the ratio of water vapor pressure to Ta. Among these models, XGBoost achieved the highest balanced accuracy, with the best scores of 0.78 at Ny-Ålesund (Arctic) and 0.78 at Syowa (Antarctica). The evaluation employed a leave-one-year-out approach to ensure robust temporal validation. Finally, the results from cross-station models highlighted the need for deeper investigation, particularly through clustering stations with similar environmental and climatic characteristics to improve generalization and transferability across locations. Additionally, the use of feature normalization strategies proved effective in reducing inter-station variability and promoting more stable model performance across diverse settings. Full article
(This article belongs to the Special Issue Addressing Climate Change with Artificial Intelligence Methods)
Show Figures

Figure 1

22 pages, 6402 KB  
Article
A Study on Airborne Hyperspectral Tree Species Classification Based on the Synergistic Integration of Machine Learning and Deep Learning
by Dabing Yang, Jinxiu Song, Chaohua Huang, Fengxin Yang, Yiming Han and Ruirui Wang
Forests 2025, 16(6), 1032; https://doi.org/10.3390/f16061032 - 19 Jun 2025
Viewed by 1403
Abstract
Against the backdrop of global climate change and increasing ecological pressure, the refined monitoring of forest resources and accurate tree species identification have become essential tasks for sustainable forest management. Hyperspectral remote sensing, with its high spectral resolution, shows great promise in tree [...] Read more.
Against the backdrop of global climate change and increasing ecological pressure, the refined monitoring of forest resources and accurate tree species identification have become essential tasks for sustainable forest management. Hyperspectral remote sensing, with its high spectral resolution, shows great promise in tree species classification. However, traditional methods face limitations in extracting joint spatial–spectral features, particularly in complex forest environments, due to the “curse of dimensionality” and the scarcity of labeled samples. To address these challenges, this study proposes a synergistic classification approach that combines the spatial feature extraction capabilities of deep learning with the generalization advantages of machine learning. Specifically, a 2D convolutional neural network (2DCNN) is integrated with a support vector machine (SVM) classifier to enhance classification accuracy and model robustness under limited sample conditions. Using UAV-based hyperspectral imagery collected from a typical plantation area in Fuzhou City, Jiangxi Province, and ground-truth data for labeling, a highly imbalanced sample split strategy (1:99) is adopted. The 2DCNN is further evaluated in conjunction with six classifiers—CatBoost, decision tree (DT), k-nearest neighbors (KNN), LightGBM, random forest (RF), and SVM—for comparison. The 2DCNN-SVM combination is identified as the optimal model. In the classification of Masson pine, Chinese fir, and eucalyptus, this method achieves an overall accuracy (OA) of 97.56%, average accuracy (AA) of 97.47%, and a Kappa coefficient of 0.9665, significantly outperforming traditional approaches. The results demonstrate that the 2DCNN-SVM model offers superior feature representation and generalization capabilities in high-dimensional, small-sample scenarios, markedly improving tree species classification accuracy in complex forest settings. This study validates the model’s potential for application in small-sample forest remote sensing and provides theoretical support and technical guidance for high-precision tree species identification and dynamic forest monitoring. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

35 pages, 30272 KB  
Article
Machine-Learning-Based Integrated Mining Big Data and Multi-Dimensional Ore-Forming Prediction: A Case Study of Yanshan Iron Mine, Hebei, China
by Yuhao Chen, Gongwen Wang, Nini Mou, Leilei Huang, Rong Mei and Mingyuan Zhang
Appl. Sci. 2025, 15(8), 4082; https://doi.org/10.3390/app15084082 - 8 Apr 2025
Cited by 3 | Viewed by 3280
Abstract
With the rapid development of big data and artificial intelligence technologies, the era of Industry 4.0 has driven large open-pit mines towards digital and intelligent transformation. This is particularly true in mature mining areas such as the Yanshan Iron Mine, where the depletion [...] Read more.
With the rapid development of big data and artificial intelligence technologies, the era of Industry 4.0 has driven large open-pit mines towards digital and intelligent transformation. This is particularly true in mature mining areas such as the Yanshan Iron Mine, where the depletion of shallow proven reserves and the increasing issues of mixed surrounding rocks with shallow ore bodies make it increasingly important to build intelligent mines and implement green and sustainable development strategies. However, previous mineralization predictions for the Yanshan Iron Mine largely relied on traditional geological data (such as blasting rock powder, borehole profiles, etc.) exploration reports or three-dimensional explicit ore body models, which lacked precision and were insufficient to meet the requirements for intelligent mine construction. Therefore, this study, based on artificial intelligence technology, focuses on geoscience big data mining and quantitative prediction, with the goal of achieving multi-scale, multi-dimensional, and multi-modal precise positioning of the Yanshan Iron Mine and establishing its intelligent mine technology system. The specific research contents and results are as follows: (1) This study collected and organized multi-source geoscience data for the Yanshan Iron Mine, including geological, geophysical, and remote sensing data, such as mine drilling data, centimeter-level drone image data, and high-spectral data of rocks and minerals, establishing a rich mine big data set. (2) SOM clustering analysis was performed on the elemental data of rock and mineral samples, identifying key elements positively correlated with iron as Mg, Al, Si, S, K, Ca, and Mn. TSG was used to interpret shortwave and thermal infrared hyperspectral data of the samples, identifying the main alteration mineral types in the mining area. Combined with spectral and elemental analysis, the universality of alteration features such as chloritization and carbonation, which are closely related to the mineralization process, was further verified. (3) Based on the spectral and elemental grade data of rock and mineral samples, a training model for ore grade–spectrum correlation was constructed using Random Forests, Support Vector Machines, and other algorithms, with the SMOTE algorithm applied to balance positive and negative samples. This model was then applied to centimeter-level drone images, achieving high-precision intelligent identification of magnetite in the mining area. Combined with LiDAR image elevation data, a real-time three-dimensional surface mineral monitoring model for the mining area was built. (4) The Bagged Positive Label Unlabeled Learning (BPUL) method was adopted to integrate five evidence maps—carbonate alteration, chloritization, mixed rockization, fault zones, and magnetic anomalies—to conduct three-dimensional mineralization prediction analysis for the mining area. The locations of key target areas were delineated. The SHAP index and three-dimensional explicit geological models were used to conduct an in-depth analysis of the contributions of different feature variables in the mineralization process of the Yanshan Iron Mine. In conclusion, this study successfully constructed the technical framework for intelligent mine construction at the Yanshan Iron Mine, providing important theoretical and practical support for mineralization prediction and intelligent exploration in the mining area. Full article
(This article belongs to the Special Issue Green Mining: Theory, Methods, Computation and Application)
Show Figures

Figure 1

20 pages, 6977 KB  
Article
A Hybrid Model for Psoriasis Subtype Classification: Integrating Multi Transfer Learning and Hard Voting Ensemble Models
by İsmail Anıl Avcı, Merve Zirekgür, Barış Karakaya and Betül Demir
Diagnostics 2025, 15(1), 55; https://doi.org/10.3390/diagnostics15010055 - 28 Dec 2024
Cited by 2 | Viewed by 1924
Abstract
Background: Psoriasis is a chronic, immune-mediated skin disease characterized by lifelong persistence and fluctuating symptoms. The clinical similarities among its subtypes and the diversity of symptoms present challenges in diagnosis. Early diagnosis plays a vital role in preventing the spread of lesions and [...] Read more.
Background: Psoriasis is a chronic, immune-mediated skin disease characterized by lifelong persistence and fluctuating symptoms. The clinical similarities among its subtypes and the diversity of symptoms present challenges in diagnosis. Early diagnosis plays a vital role in preventing the spread of lesions and improving patients’ quality of life. Methods: This study proposes a hybrid model combining multiple transfer learning and ensemble learning methods to classify psoriasis subtypes accurately and efficiently. The dataset includes 930 images labeled by expert dermatologists from the Dermatology Clinic of Fırat University Hospital, representing four distinct subtypes: generalized, guttate, plaque, and pustular. Class imbalance was addressed by applying synthetic data augmentation techniques, particularly for the rare subtype. To reduce the influence of nonlesion environmental factors, the images underwent systematic cropping and preprocessing steps, such as Gaussian blur, thresholding, morphological operations, and contour detection. DenseNet-121, EfficientNet-B0, and ResNet-50 transfer learning models were utilized to extract feature vectors, which were then combined to form a unified feature set representing the strengths of each model. The feature set was divided into 80% training and 20% testing subsets and evaluated using a hard voting classifier consisting of logistic regression, random forest, support vector classifier, k-nearest neighbors, and gradient boosting algorithms. Results: The proposed hybrid approach achieved 93.14% accuracy, 96.75% precision, and an F1 score of 91.44%, demonstrating superior performance compared to individual transfer learning models. Conclusions: This method offers significant potential to enhance the classification of psoriasis subtypes in clinical and real-world settings. Full article
(This article belongs to the Special Issue Classification of Diseases Using Machine Learning Algorithms)
Show Figures

Figure 1

17 pages, 564 KB  
Article
Clade Size Statistics Under Ford’s α-Model
by Antonio Di Nunzio and Filippo Disanto
Mathematics 2024, 12(24), 3974; https://doi.org/10.3390/math12243974 - 18 Dec 2024
Viewed by 871
Abstract
Given a labeled tree topology t of n taxa, consider a population P of k leaves chosen among those of t. The clade of P is the minimal subtree P^ of t containing P, and its size [...] Read more.
Given a labeled tree topology t of n taxa, consider a population P of k leaves chosen among those of t. The clade of P is the minimal subtree P^ of t containing P, and its size |P^| is provided by the number of leaves in the clade. We study distributive properties of the clade size variable |P^| considered over labeled topologies of size n generated at random in the framework of Ford’s α-model. Under this model, starting from the one-taxon labeled topology, a random labeled topology is produced iteratively by a sequence of α-insertions, each of which adds a pendant edge to either a pendant or internal edge of a labeled topology, with a probability that depends on the parameter α[0,1]. Different values of α determine different probability distributions over the set of labeled topologies of given size n, with the special cases α=0 and α=1/2 respectively corresponding to the Yule and uniform distributions. In the first part of the manuscript, we consider a labeled topology t of size n generated by a sequence of random α-insertions starting from a fixed labeled topology t of given size k, and determine the probability mass function, mean, and variance of the clade size |P^| in t when P is chosen as the set of leaves of t inherited from t. In the second part of the paper, we calculate the probability that a set P of k leaves chosen at random in a Ford-distributed labeled topology of size n is monophyletic, that is, the probability that |P^|=k. Our investigations extend previous results on clade size statistics obtained for Yule and uniformly distributed labeled topologies. Full article
Show Figures

Figure 1

6 pages, 694 KB  
Proceeding Paper
Development of Quantitative Structure–Anti-Inflammatory Relationships of Alkaloids
by Cristian Rojas, Doménica Muñoz, Ivanna Cordero, Belén Tenesaca and Davide Ballabio
Chem. Proc. 2024, 16(1), 77; https://doi.org/10.3390/ecsoc-28-20159 - 14 Nov 2024
Viewed by 1319
Abstract
Alkaloids are naturally occurring metabolites with a wide variety of pharmacological activities and applications in science, particularly in medicinal chemistry as anti-inflammatory drugs. Because they can be labelled as active or inactive compounds against the inflammatory biological response, the aim of this work [...] Read more.
Alkaloids are naturally occurring metabolites with a wide variety of pharmacological activities and applications in science, particularly in medicinal chemistry as anti-inflammatory drugs. Because they can be labelled as active or inactive compounds against the inflammatory biological response, the aim of this work was to calibrate quantitative structure-activity relationships (QSARs) using machine learning classifiers to predict anti-inflammatory activity based on the molecular structures of alkaloids. A dataset of 100 alkaloids (58 active and 42 inactive) was retrieved from two systematic reviews. Molecules were properly curated, and the molecular geometries of the compounds were optimized using the semi-empirical method (PM3) to calculate molecular descriptors, binary fingerprints (extended-connectivity fingerprints and path fingerprints) and MACCS (Molecular ACCess System) structural keys. Then, we calibrated the QSAR models using well-known linear and non-linear machine learning classifiers, i.e., partial least squares discriminant analysis (PLSDA), random forests (RF), adaptive boosting (AdaBoost), k-nearest neighbors (kNN), N-nearest neighbors (N3) and binned nearest neighbors (BNN). For validation purposes, the dataset was randomly split into a training set and a test set in a 70:30 ratio. When using molecular descriptors, genetic algorithms-variable subset selection (GAs-VSS) was used for supervised feature selection. During the calibration of the models, a five-fold Venetian blinds cross-validation was used to optimize the classifier parameters and to control the presence of overfitting. The performance of the models was quantified by means of the non-error rate (NER) statistical parameter. Full article
Show Figures

Figure 1

18 pages, 5069 KB  
Article
Personalization of Affective Models Using Classical Machine Learning: A Feasibility Study
by Ali Kargarandehkordi, Matti Kaisti and Peter Washington
Appl. Sci. 2024, 14(4), 1337; https://doi.org/10.3390/app14041337 - 6 Feb 2024
Cited by 5 | Viewed by 3222
Abstract
Emotion recognition, a rapidly evolving domain in digital health, has witnessed significant transformations with the advent of personalized approaches and advanced machine learning (ML) techniques. These advancements have shifted the focus from traditional, generalized models to more individual-centric methodologies, underscoring the importance of [...] Read more.
Emotion recognition, a rapidly evolving domain in digital health, has witnessed significant transformations with the advent of personalized approaches and advanced machine learning (ML) techniques. These advancements have shifted the focus from traditional, generalized models to more individual-centric methodologies, underscoring the importance of understanding and catering to the unique emotional expressions of individuals. Our study delves into the concept of model personalization in emotion recognition, moving away from the one-size-fits-all approach. We conducted a series of experiments using the Emognition dataset, comprising physiological and video data of human subjects expressing various emotions, to investigate this personalized approach to affective computing. For the 10 individuals in the dataset with a sufficient representation of at least two ground truth emotion labels, we trained a personalized version of three classical ML models (k-nearest neighbors, random forests, and a dense neural network) on a set of 51 features extracted from each video frame. We ensured that all the frames used to train the models occurred earlier in the video than the frames used to test the model. We measured the importance of each facial feature for all the personalized models and observed differing ranked lists of the top features across the subjects, highlighting the need for model personalization. We then compared the personalized models against a generalized model trained using data from all 10 subjects. The mean F1 scores for the personalized models, specifically for the k-nearest neighbors, random forest, and dense neural network, were 90.48%, 92.66%, and 86.40%, respectively. In contrast, the mean F1 scores for the generic models, using the same ML techniques, were 88.55%, 91.78% and 80.42%, respectively, when trained on data from various human subjects and evaluated using the same test set. The personalized models outperformed the generalized models for 7 out of the 10 subjects. The PCA analyses on the remaining three subjects revealed relatively little facial configuration differences across the emotion labels within each subject, suggesting that personalized ML will fail when the variation among data points within a subject’s data is too low. This preliminary feasibility study demonstrates the potential as well as the ongoing challenges with implementing personalized models which predict highly subjective outcomes like emotion. Full article
(This article belongs to the Special Issue Advanced Technologies for Emotion Recognition)
Show Figures

Figure 1

18 pages, 8887 KB  
Article
Classification of Urban Green Space Types Using Machine Learning Optimized by Marine Predators Algorithm
by Jiayu Yan, Huiping Liu, Shangyuan Yu, Xiaowen Zong and Yao Shan
Sustainability 2023, 15(7), 5634; https://doi.org/10.3390/su15075634 - 23 Mar 2023
Cited by 11 | Viewed by 4022
Abstract
The accuracy of machine learning models is affected by hyperparameters when classifying different types of urban green spaces. To investigate the impact of hyperparametric algorithms on model optimization, this study used the Marine Predators Algorithm (MPA) to optimize three models: K-Nearest Neighbor (KNN), [...] Read more.
The accuracy of machine learning models is affected by hyperparameters when classifying different types of urban green spaces. To investigate the impact of hyperparametric algorithms on model optimization, this study used the Marine Predators Algorithm (MPA) to optimize three models: K-Nearest Neighbor (KNN), Support Vector Machines (SVM), and Random Forest (RF). The feasibility of the algorithm was illustrated by extracting and analyzing park green space and attached green spaces within the fifth-ring road of Beijing. A dataset of urban green space type labels was constructed using SPOT6. Three optimized models, MPA-KNN, MPA-SVM and MPA-RF, were constructed. The optimum hyperparameter combination was chosen based on the accuracy of the validation set, and the three optimized models were compared in terms of the Area Under Curve (AUC) value, accuracy on the test set, and other indicators. The results showed that applying MPA improves the accuracy of the validation set of the KNN, SVM, and RF models by 4.2%, 2.2%, and 1.2%, respectively. The MPA-RF model had an AUC value of 0.983 and a test set accuracy of 89.93%, indicating that it was the most accurate of the three models. Full article
(This article belongs to the Special Issue Spatiotemporal Data and Urban Sustainability)
Show Figures

Figure 1

Back to TopTop