Machine Learning and Data Mining: Theory and Applications

Order results

Result details

Journals

Show export options Show export options

Select all

Export citation of selected articles as:

27 pages, 3420 KB

Open AccessArticle

BRB-Based Classification of Imbalanced Cybersecurity Data in the Industrial Internet

by Yang Zhao, Yanbin Yuan, Yuhe Wang, Qun Han and Shiming Li

Symmetry 2026, 18(6), 916; https://doi.org/10.3390/sym18060916 - 27 May 2026

Viewed by 144

Abstract

Class distribution asymmetry (imbalanced data) is a prevalent problem in the field of Industrial Internet cybersecurity, where normal data far outnumber abnormal data. This causes traditional machine learning classifiers to be biased towards the majority class, severely degrading their attack detection capability. To address this issue while meeting the requirement for traceability of the decision-making process in industrial scenarios, this paper proposes an imbalanced data classification method based on the Belief Rule Base (BRB). First, the Cluster-Based Oversampling (CBO) algorithm is employed to restore the symmetry of class distribution at the data level. Then, the Evidential Reasoning (ER) iterative algorithm is used to perform attribute fusion, which reduces the number of antecedent attributes of BRB while maintaining the information, effectively alleviating the rule explosion problem. Finally, interpretable classification is realized based on BRB, and the Circle chaotic mapping Gray Wolf Optimizer (Circle-GWO) algorithm is introduced to complete model construction, parameter optimization and fine-tuning. Experimental results on the UNSW-NB15 and TON_IoT datasets demonstrate that the proposed method can effectively handle imbalanced data classification tasks in this field, providing a practical technical solution to improve the accuracy and efficiency of cybersecurity decision-making in the Industrial Internet. Full article

(This article belongs to the Topic Machine Learning and Data Mining: Theory and Applications)

► Show Figures

Figure 1

25 pages, 8836 KB

Open AccessArticle

Dual-Tensor Constrained Multi-View Subspace Clustering

by Guanghui Li, Yue Qian, Yong Cheng, You Huang, Lingbin Zeng, Shixin Yao and Xingkong Ma

Appl. Sci. 2026, 16(10), 4766; https://doi.org/10.3390/app16104766 - 11 May 2026

Viewed by 164

Abstract

Existing multi-view clustering approaches based on matrix factorization often fail to jointly capture global high-order correlations and local view-specific characteristics, and they typically suffer from instability in generating final clustering labels. To overcome these limitations, this paper presents a multi-view subspace clustering method termed dual-tensor constrained multi-view subspace clustering (DTCMVSC). Specifically, for each view, we learn an independent latent representation matrix, a projection matrix, and a basis matrix. The latent representations and projection matrices are stacked into third-order tensors, upon which tensor nuclear norm regularization is imposed to simultaneously exploit consensus structures and complementary information across views. Additionally, a consensus regularization term and adaptive view weights are introduced to align the latent representations of different views toward a unified consensus subspace. The resulting optimization problem is efficiently solved under the ADMM framework, after which a similarity matrix is constructed from the consensus representation and spectral clustering is performed to obtain the final labels. Experimental evaluations on six benchmark datasets demonstrate the superiority of DTCMVSC. Specifically, it achieves an ACC of 86.10% on CMU and an NMI of 94.17% on ORL, surpassing even the lowest-performing state-of-the-art baselines by 63.08 and 18.53 percentage points, respectively. Full article

(This article belongs to the Topic Machine Learning and Data Mining: Theory and Applications)

► Show Figures

Figure 1

18 pages, 708 KB

Open AccessArticle

NSCH-Flourishing-ML: A Curated Dataset and Reproducible Pipeline for Machine Learning Analysis of Child Flourishing

by Miguel Arcos-Argudo, Rodolfo Bojorque, Fernando Pesántez and Kely Nieto-Andrade

Data 2026, 11(5), 103; https://doi.org/10.3390/data11050103 - 3 May 2026

Viewed by 408

Abstract

Large-scale population surveys provide valuable information for studying child well-being, yet their structure often limits the direct application of machine-learning methods. The National Survey of Children’s Health (NSCH) is one of the most comprehensive datasets for monitoring children’s health and development in the United States, but the raw survey files contain logical skip patterns, categorical variables, and complex survey-design elements that require substantial preprocessing before predictive analysis can be performed. This study presents a curated machine-learning-ready benchmark dataset derived from the 2023 NSCH together with a fully reproducible computational pipeline for studying school-age child flourishing. The workflow constructs a binary flourishing outcome from four survey items related to curiosity, task persistence, emotional self-regulation, and interest in doing well in school. After restricting the sample to children aged 6–17 years and retaining only records with valid responses in all four outcome items, the final analytical dataset contained 32,934 observations. Feature selection based on mutual information computed on the training partition, combined with cross-validated subset-size selection, yielded a final benchmark subset of 150 predictors. Baseline experiments using logistic regression and random forest showed stable and reasonably strong predictive performance, with held-out ROC-AUC values around 0.84–0.85 and closely aligned cross-validation results. An exploratory comparison between weighted and unweighted learning further showed that survey weighting did not improve discriminative performance in this benchmark setting, although the magnitude of the effect was modest and model-dependent. By releasing both the curated benchmark dataset and the reproducible pipeline, this study provides a reusable resource for machine-learning research on child well-being and survey-based computational benchmarking. Full article

(This article belongs to the Topic Machine Learning and Data Mining: Theory and Applications)

► Show Figures

Figure 1

23 pages, 13014 KB

Open AccessArticle

Seasonal Estimation of Net Surface Shortwave Radiation Using Multiple Machine Learning Algorithms, Remote Sensing Observation, and In-Situ Station

by Nuan Wang, Shisong Cao, Mingyi Du, Jingyi Chen, Ling Li, Yang Liu and Huiping Sun

Appl. Sci. 2026, 16(9), 4370; https://doi.org/10.3390/app16094370 - 29 Apr 2026

Viewed by 284

Abstract

Net surface shortwave radiation (NSSR) is a key parameter in the Earth’s energy cycle, greatly affecting global water and heat balance. Currently, a comprehensive comparative analysis regarding the accuracy of different models remains severely lacking, and there is also a notable deficiency in the systematic exploration of seasonal radiative drivers. Therefore, we developed a machine learning-based seasonal NSSR estimation model. By integrating in-situ observational data with multi-source remote sensing datasets, we achieved precise quantification of radiative fluxes. This proposed model framework employed three cutting-edge algorithms, namely Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), to capture the non-linear interactions among radiative drivers across the four seasons. Through mechanistic sensitivity analysis, we quantified the impacts of key variables on NSSR prediction. The results unequivocally demonstrated that the RF algorithm demonstrated the best performance. Its seasonal R² were 0.95 (spring), 0.89 (summer), 0.95 (autumn), and 0.96 (winter). The Solar Zenith Angle (SZA) dominated in spring and winter; its absence reduced R² by 0.23 and raised RMSE by 20.66–26.42 W/m². Meteorological factors mattered most in summer; excluding them cut R² by 0.17 and hiked RMSE by 23.82 W/m². This study provides actionable insights for terrestrial radiation budget research. Full article

(This article belongs to the Topic Machine Learning and Data Mining: Theory and Applications)

► Show Figures

Figure 1

19 pages, 522 KB

Open AccessArticle

Halpin’s Differential Test Functioning via Robust Linking: A Comparison of Bisquare and L₀ Loss Functions

by Alexander Robitzsch

Information 2026, 17(5), 428; https://doi.org/10.3390/info17050428 - 29 Apr 2026

Viewed by 392

Abstract

Differential test functioning (DTF) assesses, within an item response model, whether differential item functioning (DIF) affects the test as a whole. A recent contribution by Halpin (2025, arXiv) introduced a DTF statistic defined as the difference between a robust linking method based on the bisquare loss function and a nonrobust linking method such as mean–mean linking. The present article applies this statistic in the context of robust mean–geometric mean linking using the

L_{0}

loss function and compares it with Halpin’s original bisquare-loss approach. Alternative confidence interval estimation methods are evaluated for statistical inference for the DTF statistic. The findings indicate that the

L_{0}

loss function yields a smaller bias in the group mean estimate under several conditions than the bisquare loss function. However, the DTF statistic is estimated more precisely with the bisquare than with the

L_{0}

loss function. Moreover, the most satisfactory statistical inference is obtained from bias-corrected bootstrap and basic bootstrap confidence intervals based on a parametric rather than nonparametric bootstrap. Full article

(This article belongs to the Topic Machine Learning and Data Mining: Theory and Applications)

► Show Figures

Figure 1

24 pages, 4544 KB

Open AccessArticle

DualGAD: A Generalist Graph Anomaly Detection Method via Dual-Encoder Architecture

by Jizhao Liu, Shuo Mao, Shuqin Zhang, Fangfang Shan and Jun Li

Information 2026, 17(5), 416; https://doi.org/10.3390/info17050416 - 27 Apr 2026

Viewed by 361

Abstract

Due to the capability of graph structures to model complex relationships, graph anomaly detection has significant application value in various domains, including financial fraud detection, network security, and fake account identification. Traditional graph anomaly detection methods follow a specialized paradigm of “one dataset, one model”, which requires retraining or fine-tuning models for each new domain. This approach faces critical challenges in practical applications, namely high deployment costs and limited generalization capability. To address this problem, generalist graph anomaly detection aims to achieve the goal of “train once, apply across domains”. However, existing generalist methods primarily rely on graph neural networks to implicitly learn structural information, where the learned structural representations are tightly coupled with specific topology distributions, resulting in limited structural stability under domain shifts. To address this limitation, we propose DualGAD, a generalist graph anomaly detection method via a dual-encoder architecture. In particular, DualGAD introduces explicit structural modeling that characterizes the relative topological deviation of nodes with respect to the overall graph structure, thereby enhancing structural invariance across heterogeneous domains. This method separately models node attribute information and explicit graph structural information via an attribute feature encoder and an explicit structural feature encoder, and adopts an “attribute-dominant, structure-complementary” fusion strategy to achieve collaborative modeling. Experiments on eight real datasets demonstrate that DualGAD achieves an average improvement of 3.12% in AUROC compared to the strongest baseline methods, exhibiting significant cross-domain generalization capability. Full article

(This article belongs to the Topic Machine Learning and Data Mining: Theory and Applications)

► Show Figures

Figure 1

20 pages, 1426 KB

Open AccessReview

Profiling Decision-Making Styles Under Healthcare Resource Scarcity: An Interdisciplinary Clustering Approach

by Micaela Pinho, Fátima Leal and Isabel Miguel

Information 2026, 17(3), 287; https://doi.org/10.3390/info17030287 - 14 Mar 2026

Cited by 1 | Viewed by 665

Abstract

Scarcity of healthcare resources requires prioritisation decisions that raise complex ethical, economic, and social challenges. While normative frameworks provide guidance on how such decisions ought to be made, growing evidence suggests that individuals differ substantially in how they approach morally charged allocation choices. This study investigates heterogeneity in decision-making styles and support for healthcare prioritisation criteria using an interdisciplinary approach that integrates health economics, social psychology, and computational methods to identify latent decision-making profiles among a sample of adults residing in Portugal. Data were collected from adults residing in Portugal using a structured online questionnaire comprising socio-demographic characteristics, decision-making styles, and preferences elicited through twenty hypothetical healthcare rationing scenarios. The results reveal three meaningful decision-making profiles characterised by different combinations of cognitive styles and ethical prioritisation patterns: analytically oriented decision-makers prioritising health gains; intuitive, context-sensitive decision-makers balancing clinical and social criteria; heuristic-driven decision-makers relying on simpler or less differentiated heuristics. These findings demonstrate that, within this sample, healthcare prioritisation preferences are shaped by systematic variations in decision style rather than a single moral or rational framework. By linking behavioural heterogeneity with ethical decision-making, this study contributes to theoretical debates on healthcare rationing and demonstrates the value of clustering techniques for uncovering latent structures in complex decision data. The results provide insights relevant for the design of decision-support systems and rationing policies, which may be adapted to accommodate heterogeneous decision styles in comparable settings. Full article

(This article belongs to the Topic Machine Learning and Data Mining: Theory and Applications)

► Show Figures

Figure 1

25 pages, 639 KB

Open AccessArticle

A Sparse

L_{\infty}

-Norm Regularized Least Squares Support Vector Regression

by Xiaoyong Liu, Dong Li and Chengbin Zeng

Algorithms 2026, 19(2), 160; https://doi.org/10.3390/a19020160 - 18 Feb 2026

Cited by 1 | Viewed by 526

Abstract

Although Least Squares Support Vector Regression (LSSVR) reduces the hyperparameter space to two, it sacrifices sparsity, causing all training samples to become support vectors and increasing storage costs. In contrast, standard Support Vector Regression (SVR) preserves sparsity but requires tuning three highly coupled [...] Read more.

L_{\infty}

-norm regularized least squares SVR framework that incorporates the infinity norm of approximation errors into both the objective function and inequality constraints. The resulting optimization problem minimizes model complexity while controlling the maximum prediction deviation through a single slack variable, thereby transforming the conventional three-hyperparameter SVR tuning task into a two-parameter problem involving only the regularization coefficient and kernel width. This formulation restores sparsity by enabling a compact support vector set, while preserving the stability and convexity advantages of LSSVR. Experiments on both static and dynamic datasets demonstrate that the proposed method consistently achieves higher predictive accuracy and improved robustness compared with standard SVR and LSSVR. These results indicate that the proposed

L_{\infty}

-norm regularized framework offers a mathematically principled and computationally efficient alternative for sparse, robust, and scalable regression modeling. Full article

(This article belongs to the Topic Machine Learning and Data Mining: Theory and Applications)

► Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Displaying articles 1-8

Submit your Abstract

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Algorithms algorithms	2.1	4.5	2008	19.2 Days	CHF 1800	Submit
Applied Sciences applsci	2.5	5.5	2011	16 Days	CHF 2400	Submit
AppliedMath appliedmath	0.7	1.1	2021	20.6 Days	CHF 1200	Submit
Data data	2.0	5.0	2016	25 Days	CHF 1600	Submit
Information information	2.9	6.5	2010	20.9 Days	CHF 1800	Submit
Symmetry symmetry	2.2	5.3	2009	15.8 Days	CHF 2400	Submit

Topic Menu

Topic Editors

Machine Learning and Data Mining: Theory and Applications

Topic Information

Keywords

Participating Journals

Published Papers (8 papers)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI