MDPI - Publisher of Open Access Journals

19 pages, 352 KB

Open AccessArticle

Denominational Differentiation and Religiosity Among the Hungarian Minority of Transylvania: Evidence from the European Values Study

by Levente Székedi

Religions 2026, 17(6), 647; https://doi.org/10.3390/rel17060647 - 27 May 2026

Viewed by 200

Abstract

The Hungarian minority of Transylvania comprises four historically received denominations—Roman Catholic, Reformed, Unitarian, and Lutheran—whose institutional profiles differ markedly despite their shared function as carriers of minority cultural identity. Using the European Values Study 2017 Romanian Hungarian minority oversample (GESIS ZA7550; [...] Read more.

The Hungarian minority of Transylvania comprises four historically received denominations—Roman Catholic, Reformed, Unitarian, and Lutheran—whose institutional profiles differ markedly despite their shared function as carriers of minority cultural identity. Using the European Values Study 2017 Romanian Hungarian minority oversample (GESIS ZA7550;

N = 1106

), this article presents the first regression-based analysis of intra-community denominational variation in religiosity in this dataset. Four binary logistic regression models test whether denomination independently predicts church attendance, confidence in church, subjective importance of religion, and self-described religiosity type (institutional versus personalised), net of sociodemographic controls. Catholics attend services significantly more frequently than Reformed members, while Reformed members express higher confidence in their church—a practice–trust reversal explicable by the distinction between canonical obligation and ethnic embeddedness. Subjective religious importance does not vary by denomination, consistent with an identity-protection mechanism operating uniformly across confessions. Denomination does not independently predict institutional versus personalised religiosity type once sociodemographic controls are applied, with age emerging as the dominant axis of variation on this dimension. The findings engage with Davie’s believing/belonging/behaving framework and the debate on whether denominational cleavage or the secular–religious divide constitutes the primary axis of religious differentiation in contemporary Europe. Full article

(This article belongs to the Section Religions and Health/Psychology/Social Sciences)

31 pages, 1116 KB

Open AccessArticle

AI-Driven Clustering-Based Stratification of Allergic Patients Towards Smart Healthcare Systems in Southern Italy

by Stefano Palazzo, Esra Hazar, Arife Uslu Gokceoglu, Giovanni Zambetta, Roberto Caldelli and Claudio Loconsole

Computers 2026, 15(5), 296; https://doi.org/10.3390/computers15050296 - 7 May 2026

Viewed by 406

Abstract

A clustering analysis was conducted to identify distinct patient subgroups with White Blood Cells (WBC) count alongside Age and Total Immunoglobulin E (IgE) biomarkers. All data were obtained from a coordinated primary care network operating in Apulia (Southern Italy). We analyzed 300 patient [...] Read more.

A clustering analysis was conducted to identify distinct patient subgroups with White Blood Cells (WBC) count alongside Age and Total Immunoglobulin E (IgE) biomarkers. All data were obtained from a coordinated primary care network operating in Apulia (Southern Italy). We analyzed 300 patient records, performed preprocessing and exploratory data analysis, and then applied unsupervised clustering directly to the standardized three-variable feature space (Age, WBC, and Total IgE), followed by supervised validation steps. Several algorithms were applied for clustering. Among the evaluated methods, K-means and Spectral Clustering showed the most favorable internal validation profiles, based on Silhouette Score (SS), Calinski–Harabasz Index (CH), and Davies–Bouldin Index (DB). K-means achieved the best scores (SS = 0.406, CH = 190.00, DB = 0.900), closely followed by Spectral Clustering (SS = 0.398, CH = 182.57, DB = 0.936), outperforming Agglomerative Clustering (SS = 0.361, CH = 160.41, DB = 1.016) and Gaussian Mixture Models (SS = 0.233, CH = 103.89, DB = 1.289). Post-clustering ANOVA analyses indicated significant differences in WBC, age, and total IgE across the five consensus clusters. An evaluation of cluster internal separability occurred through the training of a Random Forest classifier to predict cluster membership. The results indicate internal cluster separability within the analyzed dataset, but more external verification and clinical evidence are necessary for validation. The research group established clinical descriptions along with suggested treatment plans and detected co-existing diseases to help validate model-based findings. A simplified cluster-informed clinical summary based on biomarker ranges was derived to support interpretation of the identified patient profiles. This integrated method preliminarily suggests that patient strata may be identified from routine clinical variables, while highlighting the importance of internal validation and clinical interpretability in clustering research. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence and Modeling Frameworks in Health Informatics and Related Fields)

► Show Figures

Figure 1

33 pages, 3735 KB

Open AccessFeature PaperArticle

Artificial Neural Network-Based Classification of Industrial Sustainability Profiles for Differentiated Fiscal Policy Design in Remanufacturing Processes

by Marta Lilia Eraña-Díaz, Juana Enríquez-Urbano, Beatriz Martínez-Bahena, Jazmin Yanel Juárez-Chávez, Alfonso D’Granda-Trejo and Javier De-la-Rosa-Mondragon

Processes 2026, 14(9), 1501; https://doi.org/10.3390/pr14091501 - 6 May 2026

Viewed by 464

Abstract

The design of differentiated fiscal instruments for industrial sustainability requires robust, data-driven tools capable of capturing the heterogeneity of environmental performance across manufacturing units—a challenge that conventional econometric approaches address only partially, given the non-linear nature of operational–environmental interactions in reconfigurable production systems. [...] Read more.

The design of differentiated fiscal instruments for industrial sustainability requires robust, data-driven tools capable of capturing the heterogeneity of environmental performance across manufacturing units—a challenge that conventional econometric approaches address only partially, given the non-linear nature of operational–environmental interactions in reconfigurable production systems. This study introduces a two-phase computational framework that integrates unsupervised machine learning and supervised classification to generate evidence-based sustainability profiles for fiscal policy targeting. Its principal contribution is the combination of K-Means clustering with a binary artificial neural network (ANN) classifier, operationalized through an accessible decision-support interface that enables differentiated incentive allocation without requiring programming expertise from policymakers. A dataset of 1000 manufacturing records comprising seven operational and technological input variables—material usage, production capacity, reconfiguration time, downtime, AI optimization, IoT connectivity, and predictive maintenance—and three environmental output indicators—energy consumption, carbon emissions, and waste generation—was analyzed. In Phase One, K-Means segmentation with k = 6, selected through multi-criteria convergence (Silhouette = 0.102; Elbow, Davies–Bouldin, and Calinski–Harabasz indices), identified six distinct sustainability profiles with marked environmental differentiation. In Phase Two, a binary ANN classifier (architecture: 7 → 64 → 32 → 1 neurons; ReLU and sigmoid activations) was trained to distinguish the reference cluster C0 (low environmental impact: energy 145.1 kWh, emissions 45.2 CO₂-eq) from the high-impact cluster C1 (emissions 67.8 CO₂-eq, waste 41.5 kg). The trained classifier achieved an overall accuracy of 75.4% and an AUC-ROC of 0.774 on the held-out test set, with a macro-averaged F1-score of 0.753 and a Cohen’s kappa coefficient of 0.508, indicating moderate-to-substantial agreement beyond chance. Class C1 (high-impact establishments) achieved a precision of 0.794 and a recall of 0.730, supporting reliable identification of manufacturing units that would most benefit from targeted fiscal support. The framework is deployed through a Gradio-based graphical interface incorporating a traffic-light sustainability classification (green/yellow/red), enabling direct and interactive application by tax authorities and industrial policymakers. The modular architecture supports adaptation to larger or sector-specific datasets, making it transferable across industrial policy contexts. Full article

(This article belongs to the Special Issue Thermal Engineering: Energy Conversion, Numerical Simulation, and Advanced Control)

► Show Figures

Graphical abstract

45 pages, 9546 KB

Open AccessArticle

Unsupervised Hierarchical Visual Taxonomy of Marble Natural Stone Using Cluster-Aware Self-Supervised Vision Transformers

by Margarida Figueiredo, Carlos M. A. Diogo, Gustavo Paneiro, Pedro Amaral and António Alves de Campos

Appl. Sci. 2026, 16(9), 4137; https://doi.org/10.3390/app16094137 - 23 Apr 2026

Viewed by 281

Abstract

The marble industry relies on proprietary commercial names rather than objective visual categories, creating market inefficiencies for stakeholders who select stones based on appearance. Supervised classification perpetuates this problem by replicating inconsistent commercial labels instead of discovering intrinsic visual structure. We propose an [...] Read more.

The marble industry relies on proprietary commercial names rather than objective visual categories, creating market inefficiencies for stakeholders who select stones based on appearance. Supervised classification perpetuates this problem by replicating inconsistent commercial labels instead of discovering intrinsic visual structure. We propose an unsupervised pipeline combining a two-stage training strategy: A pure self-supervised pretraining followed by cluster-aware fine-tuning of a DINO Vision Transformer, with empirically selected dimensionality reduction and agglomerative hierarchical clustering. Systematic ablation studies on 1480 marble images spanning 10 commercial varieties validate each design choice: cluster-aware training at k = 10 yields geometrically improved embeddings over the self-supervised baseline (mean Silhouette Score 0.693 ± 0.053 vs. 0.660 ± 0.030; mean Davies–Bouldin Index 0.386 ± 0.075 vs. 0.569 ± 0.012; N = 9 independent evaluations across 3 data partitions × 3 training initializations). The resulting taxonomy reveals three phenomena invisible to commercial classification: cross-category merging of visually indistinguishable stones carrying different market names, intra-category splitting of heterogeneous sub-populations within single varieties, and coherent grouping where commercial and visual boundaries coincide, with all three confirmed in every independent run. We further demonstrate that standard extrinsic metrics are misaligned with unsupervised taxonomy objectives when reference labels encode the inconsistencies the method aims to resolve. Validating this methodology across diverse stone types, larger datasets, and varied acquisition conditions represents a natural and necessary next step toward establishing its cross-domain generalizability. Full article

(This article belongs to the Special Issue Recent Advances and New Trends in Computer Vision and Image Processing)

► Show Figures

Figure 1

19 pages, 2380 KB

Open AccessArticle

DTBAffinity: A Multi-Modal Feature Engineering and Gradient-Boosting Framework for Drug–Target Binding Affinity on Davis and KIBA Benchmarks

by Meshari Alazmi

Computers 2026, 15(3), 182; https://doi.org/10.3390/computers15030182 - 10 Mar 2026

Viewed by 832

Abstract

An accurate prediction of how strongly a drug binds to its target (where the drug will have the desired effect) is very important for drug discovery. It helps select the most promising compounds and saves money by doing fewer experiments. We present DTBAffinity, [...] Read more.

An accurate prediction of how strongly a drug binds to its target (where the drug will have the desired effect) is very important for drug discovery. It helps select the most promising compounds and saves money by doing fewer experiments. We present DTBAffinity, a multi-modal regression framework that integrates chemically meaningful ligand descriptors with diverse protein sequence features in a unified gradient-boosting model. The representation of ligands includes physicochemical and topological descriptors (RDKit and Mordred), structural keys (MACCS and FP4), circular fingerprints (ECFP/Morgan), and SMILES-derived features from iFeatureOmega. For proteins, thousands of sequence-derived descriptors (composition, autocorrelations, physicochemical profiles, and evolutionary indices) from iFeatureOmega are used, together with contextual embeddings from large protein language models (ESM-1b, ESM-2). The feature matrices are cleaned up, variance filtered, z-score scaled, and univariate selected before being concatenated and modeled with regularized XGBoost ensembles. We evaluate DTBAffinity on two kinase-centric datasets that are commonly used: Davis (30,056 interactions: pKd values) and KIBA (118,254 interactions: integrated affinity scores). Various metrics are used to measure the performance, such as MSE, R², Pearson/Spearman correlations, Concordance Index (CI), r_m², and AUPR. On Davis, DTBAffinity yields MSE = 0.1885, CI = 0.9102, and AUPR = 0.8112, and on KIBA, it gives MSE = 0.1540, CI = 0.8686, and AUPR = 0.8361; thus, it is better than the state-of-the-art baselines such as KronRLS, SimBoost, DeepDTA, and GraphDTA. The findings here imply that the combination of interpretable descriptors and contextual embeddings in a robust boosting framework is a great way to realize accurate, interpretable, and generalizable DTBA prediction. Full article

(This article belongs to the Special Issue AI in Bioinformatics)

► Show Figures

Figure 1

33 pages, 12641 KB

Open AccessArticle

Exploring the Impact of Different Clustering Algorithms on the Performance of Ensemble Learning-Based Mass Appraisal Models

by Suleyman Sisman, Abdullah Kara and Arif Cagdas Aydinoglu

Buildings 2026, 16(3), 615; https://doi.org/10.3390/buildings16030615 - 2 Feb 2026

Cited by 1 | Viewed by 863

Abstract

Mass appraisal models are gaining use for improving valuation accuracy, yet their performance remains highly sensitive to how spatial and non-spatial data are structured before training. Clustering algorithms can be used to segment heterogeneous property groups into more homogeneous ones, potentially improving predictive [...] Read more.

Mass appraisal models are gaining use for improving valuation accuracy, yet their performance remains highly sensitive to how spatial and non-spatial data are structured before training. Clustering algorithms can be used to segment heterogeneous property groups into more homogeneous ones, potentially improving predictive performance. This study investigates the impact of different clustering algorithms, (i.e., K-Means, K-Medians and the Spatially Constrained Multivariate Clustering Algorithm (SCMCA)), on the performance of prominent ensemble learning-based mass appraisal models (i.e., Random Forest (RF), the Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost) and the Light Gradient Boosting Machine (LightGBM)). Using a comprehensive real estate dataset, clustering quality is evaluated using Silhouette, Calinski–Harabasz, and Davies–Bouldin indices, and the performance of cluster-based ensemble mass appraisal models is then compared. The findings indicate that the best performance is achieved with the SCMCA–LightGBM model combination, which reached RMSE = 0.061 and R² = 0.722. Furthermore, it is determined that clustering-based models provide improvements of up to 7.26% in MAE, 10.61% in MAPE, and 8.40% in RMSE, depending on the combination. The results show that clustering is an effective preprocessing step that can substantially enhance the predictive performance and overall quality of mass appraisal models. Full article

(This article belongs to the Special Issue Study on Real Estate and Housing Management—2nd Edition)

► Show Figures

Figure 1

26 pages, 3132 KB

Open AccessArticle

An Unsupervised Cloud-Centric Intrusion Diagnosis Framework Using Autoencoder and Density-Based Learning

by Suresh K. S, Thenmozhi Elumalai, Radhakrishnan Rajamani, Anubhav Kumar, Balamurugan Balusamy, Sumendra Yogarayan and Kaliyaperumal Prabu

Future Internet 2026, 18(1), 54; https://doi.org/10.3390/fi18010054 - 19 Jan 2026

Viewed by 839

Abstract

Cloud computing environments generate high-dimensional, large-scale, and highly dynamic network traffic, making intrusion diagnosis challenging due to evolving attack patterns, severe traffic imbalance, and limited availability of labeled data. To address these challenges, this study presents an unsupervised, cloud-centric intrusion diagnosis framework that [...] Read more.

Cloud computing environments generate high-dimensional, large-scale, and highly dynamic network traffic, making intrusion diagnosis challenging due to evolving attack patterns, severe traffic imbalance, and limited availability of labeled data. To address these challenges, this study presents an unsupervised, cloud-centric intrusion diagnosis framework that integrates autoencoder-based representation learning with density-based attack categorization. A dual-stage autoencoder is trained exclusively on benign traffic to learn compact latent representations and to identify anomalous flows using reconstruction-error analysis, enabling effective anomaly detection without prior attack labels. The detected anomalies are subsequently grouped using density-based learning to uncover latent attack structures and support fine-grained multiclass intrusion diagnosis under varying attack densities. Experiments conducted on the large-scale CSE-CIC-IDS2018 dataset demonstrate that the proposed framework achieves an anomaly detection accuracy of 99.46%, with high recall and low false-negative rates in the optimal latent-space configuration. The density-based classification stage achieves an overall multiclass attack classification accuracy of 98.79%, effectively handling both majority and minority attack categories. Clustering quality evaluation reports a Silhouette Score of 0.9857 and a Davies–Bouldin Index of 0.0091, indicating strong cluster compactness and separability. Comparative analysis against representative supervised and unsupervised baselines confirms the framework’s scalability and robustness under highly imbalanced cloud traffic, highlighting its suitability for future Internet cloud security ecosystems. Full article

(This article belongs to the Special Issue Cloud and Edge Computing for the Next-Generation Networks)

► Show Figures

Figure 1

21 pages, 3313 KB

Open AccessArticle

MGF-DTA: A Multi-Granularity Fusion Model for Drug–Target Binding Affinity Prediction

by Zheng Ni, Bo Wei and Yuni Zeng

Int. J. Mol. Sci. 2026, 27(2), 947; https://doi.org/10.3390/ijms27020947 - 18 Jan 2026

Viewed by 692

Abstract

Drug–target affinity (DTA) prediction is one of the core components of drug discovery. Despite considerable advances in previous research, DTA tasks still face several limitations with insufficient multi-modal information of drugs, the inherent sequence length limitation of protein language models, and single attention [...] Read more.

Drug–target affinity (DTA) prediction is one of the core components of drug discovery. Despite considerable advances in previous research, DTA tasks still face several limitations with insufficient multi-modal information of drugs, the inherent sequence length limitation of protein language models, and single attention mechanisms that fail to capture critical multi-scale features. To alleviate the above limitations, we developed a multi-granularity fusion model for drug–target binding affinity prediction, termed MGF-DTA. This model is composed of three fusion modules, specifically as follows. First, the model extracts deep semantic features of SMILES strings through ChemBERTa-2 and integrates them with molecular fingerprints by using gated fusion to enhance the multi-modal information of drugs. In addition, it employs a residual fusion mechanism to integrate the global embeddings from ESM-2 with the local features obtained by the k-mer and principal component analysis (PCA) method. Finally, a hierarchical attention mechanism is employed to extract multi-granularity features from both drug SMILES strings and protein sequences. Comparative analysis with other mainstream methods on the Davis, KIBA, and BindingDB datasets reveals that the MGF-DTA model exhibits outstanding performance advantages. Further, ablation studies confirm the effectiveness of the model components and case study illustrates its robust generalization capability. Full article

(This article belongs to the Special Issue Computational Approaches in Drug Discovery and Design: From Molecular Modeling to Translational Applications)

► Show Figures

Figure 1

36 pages, 6828 KB

Open AccessArticle

Discriminating Music Sequences Method for Music Therapy—DiMuSe

by Emil A. Canciu, Florin Munteanu, Valentin Muntean and Dorin-Mircea Popovici

Appl. Sci. 2026, 16(2), 851; https://doi.org/10.3390/app16020851 - 14 Jan 2026

Viewed by 501

Abstract

The purpose of this research was to investigate whether music empirically associated with therapeutic effects contains intrinsic informational structures that differentiate it from other sound sequences. Drawing on ontology, phenomenology, nonlinear dynamics, and complex systems theory, we hypothesize that therapeutic relevance may be [...] Read more.

The purpose of this research was to investigate whether music empirically associated with therapeutic effects contains intrinsic informational structures that differentiate it from other sound sequences. Drawing on ontology, phenomenology, nonlinear dynamics, and complex systems theory, we hypothesize that therapeutic relevance may be linked to persistent structural patterns embedded in musical signals rather than to stylistic or genre-related attributes. This paper introduces the Discriminating Music Sequences (DiMuSes) method, an unsupervised, structure-oriented analytical framework designed to detect such patterns. The method applies 24 scalar evaluators derived from statistics, fractal geometry, nonlinear physics, and complex systems, transforming sound sequences into multidimensional vectors that characterize their global temporal organization. Principal Component Analysis (PCA) reduces this feature space to three dominant components (PC1–PC3), enabling visualization and comparison in a reduced informational space. Unsupervised k-Means clustering is subsequently applied in the PCA space to identify groups of structurally similar sound sequences, with cluster quality evaluated using Silhouette and Davies–Bouldin indices. Beyond clustering, DiMuSe implements ranking procedures based on relative positions in the PCA space, including distance to cluster centroids, inter-item proximity, and stability across clustering configurations, allowing melodies to be ordered according to their structural proximity to the therapeutic cluster. The method was first validated using synthetically generated nonlinear signals with known properties, confirming its capacity to discriminate structured time series. It was then applied to a dataset of 39 music and sound sequences spanning therapeutic, classical, folk, religious, vocal, natural, and noise categories. The results show that therapeutic music consistently forms a compact and well-separated cluster and ranks highly in structural proximity measures, suggesting shared informational characteristics. Notably, pink noise and ocean sounds also cluster near therapeutic music, aligning with independent evidence of their regulatory and relaxation effects. DiMuSe-derived rankings were consistent with two independent studies that identified the same musical pieces as highly therapeutic.The present research remains at a theoretical stage. Our method has not yet been tested in clinical or experimental therapeutic settings and does not account for individual preference, cultural background, or personal music history, all of which strongly influence therapeutic outcomes. Consequently, DiMuSe does not claim to predict individual efficacy but rather to identify structural potential at the signal level. Future work will focus on clinical validation, integration of biometric feedback, and the development of personalized extensions that combine intrinsic informational structure with listener-specific response data. Full article

34 pages, 3376 KB

Open AccessArticle

Lexicographic Preferences Similarity for Coalition Formation in Complex Markets: Introducing PLPSim, HRECS, ContractLex, PriceLex, F@Lex, and PLPGen

by Faria Nassiri-Mofakham, Shadi Farid and Katsuhide Fujita

Information 2026, 17(1), 62; https://doi.org/10.3390/info17010062 - 9 Jan 2026

Viewed by 489

Abstract

Lexicographic preference trees (LP-Trees) provide a compact and expressive representation for modeling complex decision-making scenarios, yet measuring similarity between complete or partial structures remains a challenge. This study introduces PLPSim, a novel metric for quantifying alignment between partial lexicographic preference trees (PLP-Trees) and [...] Read more.

Lexicographic preference trees (LP-Trees) provide a compact and expressive representation for modeling complex decision-making scenarios, yet measuring similarity between complete or partial structures remains a challenge. This study introduces PLPSim, a novel metric for quantifying alignment between partial lexicographic preference trees (PLP-Trees) and develops three coalition formation algorithms—HRECS1, HRECS2, and HRECS3—that leverage PLPSim to group agents with similar preferences. We further propose ContractLex and PriceLex protocols (comprising CLF, CFB, CFW, CFA, CFP) for coalition-based contract and pricing strategies, along with a new evaluation metric, F@Lex, which is designed to assess satisfaction under lexicographic preferences. To illustrate the framework, we generate a synthetic dataset (PLPGen) contextualized in a hybrid renewable energy market, where consumers’ PLP-Trees are aggregated and matched with suppliers’ tariff contracts. Experiments across 162 market scenarios, evaluated using Normalized Discounted Cumulative Gain (nDCG), Davies–Bouldin dispersion, and F@Lex, demonstrate that PLPSim-based coalitions outperform baseline approaches. The combination HRECS3 + CFP yields the highest consumer satisfaction, while HRECS3 + CFB achieves balanced satisfaction for both consumers and suppliers. While electricity tariffs and renewable energy contracts—static and dynamic—serve as the motivating example, the proposed framework generalizes to diverse multi-agent systems, offering a foundation for preference-driven coalition formation, adaptive policy design, and sustainable market optimization. Full article

► Show Figures

Graphical abstract

34 pages, 5124 KB

Open AccessArticle

A Deep Ship Trajectory Clustering Method Based on Feature Embedded Representation Learning

by Yifei Liu, Zhangsong Shi, Bing Fu, Jiankang Ke, Huihui Xu and Xuan Wang

J. Mar. Sci. Eng. 2026, 14(1), 81; https://doi.org/10.3390/jmse14010081 - 31 Dec 2025

Viewed by 644

Abstract

Trajectory clustering is of great significance for identifying behavioral patterns and vessel types of non-cooperative ships. However, existing trajectory clustering methods suffer from limitations in extracting cross-spatiotemporal scale features and modeling the coupling relationship between positional and motion features, which restricts clustering performance. [...] Read more.

Trajectory clustering is of great significance for identifying behavioral patterns and vessel types of non-cooperative ships. However, existing trajectory clustering methods suffer from limitations in extracting cross-spatiotemporal scale features and modeling the coupling relationship between positional and motion features, which restricts clustering performance. To address this, this study proposes a deep ship trajectory clustering method based on feature embedding representation learning (ERL-DTC). The method designs a Temporal Attention-based Multi-scale feature Aggregation Network (TA-MAN) to achieve dynamic fusion of trajectory features from micro to macro scales. A Dual-feature Self-attention Fusion Encoder (DualSFE) is employed to decouple and jointly represent the spatiotemporal position and motion features of trajectories. A two-stage optimization strategy of “pre-training and joint training” is adopted, combining contrastive loss and clustering loss to jointly constrain the embedding representation learning, ensuring it preserves trajectory similarity relationships while being adapted to the clustering task. Experiments on a public vessel trajectory dataset show that for a four-class task (K = 4), ERL-DTC improves ACC by approximately 14.1% compared to the current best deep clustering method, with NMI and ARI increasing by about 28.9% and 30.2%, respectively. It achieves the highest Silhouette Coefficient (SC) and the lowest Davies-Bouldin Index (DBI), indicating a tighter and more clearly separated cluster structure. Furthermore, its inference efficiency is improved by two orders of magnitude compared to traditional point-matching-based methods, without significantly increasing runtime due to model complexity. Ablation studies and parameter sensitivity analysis further validate the necessity of each module design and the rationality of hyperparameter settings. This research provides an efficient and robust solution for feature learning and clustering of vessel trajectories across spatiotemporal scales. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

29 pages, 8289 KB

Open AccessArticle

Clustering as a Prerequisite for Reliable Machine Learning Prediction of Multi-Odor Systems in Wastewater Treatment

by Su-chul Yoon, Chae-ho Kim and Dong-chul Shin

Atmosphere 2026, 17(1), 18; https://doi.org/10.3390/atmos17010018 - 23 Dec 2025

Viewed by 826

Abstract

Complex odor emissions from wastewater treatment plants consist of multiple volatile compounds that exhibit heterogeneous temporal dynamics and low linear correlations, making accurate prediction and interpretation difficult when analyzed on a single-compound basis. This study investigates whether clustering can serve not only as [...] Read more.

Complex odor emissions from wastewater treatment plants consist of multiple volatile compounds that exhibit heterogeneous temporal dynamics and low linear correlations, making accurate prediction and interpretation difficult when analyzed on a single-compound basis. This study investigates whether clustering can serve not only as an exploratory tool but as an essential preprocessing step to enhance machine-learning performance in multi-odor prediction systems. A total of 22 designated odorants were continuously monitored, and their pairwise dependencies were evaluated using Pearson correlation and mutual information. Data-driven clustering was performed through K-means, hierarchical linkage, and principal-component–based latent grouping, and the resulting structures were quantitatively compared with functional-group-based chemical classifications using the consistency ratio and Jaccard similarity index. Cluster validity was further examined using the Silhouette Coefficient, Davies–Bouldin Index, and Calinski–Harabasz Index. The predictive contribution of clustering was verified by training XGBoost regression models on both raw and cluster-structured datasets. The clustered dataset yielded higher predictive accuracy, with increased R² and reduced MAE and RMSE across most odorants. SHAP analysis further confirmed that clustering improved model interpretability by stabilizing feature contributions and reducing noise-driven importance shifts. The findings demonstrate that clustering is not a supplementary diagnostic tool, but a prerequisite for building reliable, high-performance machine-learning models in complex odor systems. This integrative framework offers a methodological foundation for multi-odor forecasting, source tracking, and next-generation odor management platforms. Full article

(This article belongs to the Special Issue Environmental Odour (2nd Edition))

► Show Figures

Figure 1

29 pages, 3021 KB

Open AccessArticle

Fog-Aware Hierarchical Autoencoder with Density-Based Clustering for AI-Driven Threat Detection in Smart Farming IoT Systems

by Manikandan Thirumalaisamy, Sumendra Yogarayan, Md Shohel Sayeed, Siti Fatimah Abdul Razak and Ramesh Shunmugam

Future Internet 2025, 17(12), 567; https://doi.org/10.3390/fi17120567 - 10 Dec 2025

Viewed by 805

Abstract

Smart farming relies heavily on IoT automation and data-driven decision making, but this growing connectivity also increases exposure to cyberattacks. Flow-based unsupervised intrusion detection is a privacy-preserving alternative to signature and payload inspection, yet it still faces three challenges: loss of subtle anomaly [...] Read more.

Smart farming relies heavily on IoT automation and data-driven decision making, but this growing connectivity also increases exposure to cyberattacks. Flow-based unsupervised intrusion detection is a privacy-preserving alternative to signature and payload inspection, yet it still faces three challenges: loss of subtle anomaly cues during Autoencoder (AE) compression, instability of fixed reconstruction-error thresholds, and performance degradation of clustering in noisy high-dimensional spaces. To address these issues, we propose a fog-aware two-stage hierarchical AE with latent-space gating, followed by Density-Based Spatial Clustering of Applications with Noise (DBSCAN) for attack categorization. A shallow AE compresses the input into a compact 21-dimensional latent space, reducing computational demand for fog-node deployment. A deep AE then computes reconstruction-error scores to isolate malicious behavior while denoising latent features. Only high-error latent vectors are forwarded to DBSCAN, which improves cluster separability, reduces noise sensitivity, and avoids predefined cluster counts or labels. The framework is evaluated on two benchmark datasets. On CIC IoT-DIAD 2024, it achieves 98.99% accuracy, 0.9897 F1-score, 0.895 Adjusted Rand Index (ARI), and 0.019 Davies–Bouldin Index (DBI). To examine generalizability beyond smart farming traffic, we also evaluate the framework on the CSE-CIC-IDS2018 benchmark, where it achieves 99.33% accuracy, 0.9928 F1-score, 0.9013 ARI, and 0.0174 DBI. These results confirm that the proposed model can reliably detect and categorize major cyberattack families across distinct IoT threat landscapes while remaining compatible with resource-constrained fog computing environments. Full article

(This article belongs to the Special Issue Clustered Federated Learning for Networks)

► Show Figures

Figure 1

32 pages, 1584 KB

Open AccessArticle

Adaptive Sparse Clustering of Mixed Data Using Azzalini-Encoded Ordinal Variables

by Ismail Arjdal, Mohamed Alahiane, Echarif Elharfaoui and Mustapha Rachdi

Axioms 2025, 14(12), 902; https://doi.org/10.3390/axioms14120902 - 7 Dec 2025

Viewed by 482

Abstract

In this paper, we propose a novel sparse clustering method designed for high-dimensional mixed-type data, integrating Azzalini’s score-based encoding for ordinal variables. Our approach aims to retain the inherent nature of each variable type—continuous, ordinal, and nominal—while enhancing clustering quality and interpretability. To [...] Read more.

In this paper, we propose a novel sparse clustering method designed for high-dimensional mixed-type data, integrating Azzalini’s score-based encoding for ordinal variables. Our approach aims to retain the inherent nature of each variable type—continuous, ordinal, and nominal—while enhancing clustering quality and interpretability. To this end, we extend classical distance metrics and adapt the Davies–Bouldin Index (DBI) to better reflect the structure of mixed data. We also introduce a weighted formulation that accounts for the distinct contributions of variable types in the clustering process. Empirical results on simulated and real-world datasets demonstrate that our method consistently achieves better separation and coherence of clusters compared to traditional techniques, while effectively identifying the most informative variables. This work opens promising directions for clustering in complex, high-dimensional settings such as marketing analytics and customer segmentation. Full article

(This article belongs to the Special Issue Stochastic Modeling and Optimization Techniques)

► Show Figures

Figure 1

14 pages, 954 KB

Open AccessArticle

Comparison of K-Means and Hierarchical Clustering Methods for Buffalo Milk Production Data

by Lucia Trapanese, Giovanna Bifulco, Matteo Santinello, Nicola Pasquino, Giuseppe Campanile and Angela Salzano

Animals 2025, 15(22), 3246; https://doi.org/10.3390/ani15223246 - 9 Nov 2025

Cited by 1 | Viewed by 1212

Abstract

This study investigated the use of K-means and hierarchical clustering, to group Italian Mediterranean buffalo using routinely collected test-day records. The analysis was first conducted on a combined dataset comprising three buffalo herds and subsequently on each herd individually. The main objective was [...] Read more.

This study investigated the use of K-means and hierarchical clustering, to group Italian Mediterranean buffalo using routinely collected test-day records. The analysis was first conducted on a combined dataset comprising three buffalo herds and subsequently on each herd individually. The main objective was to determine whether data-driven groupings could be implemented to support improvements in general herd management strategies. Results indicated that K-means consistently outperformed hierarchical clustering across all datasets, as reflected by average silhouette scores (0.17–0.18 vs. 0.10–0.12 for K-means and hierarchical, respectively), favorable Davies–Bouldin Index (DBI; 2.05–2.16 vs. 2.11–2.5 for K-means and hierarchical, respectively) and Calinski–Harabasz Index values (CHI; 1034–3877 vs. 729–2109 for K-means and hierarchical, respectively). K-means identified two clusters in the combined dataset and in two of the three herds, while three clusters were identified in the remaining herd. Cluster composition analysis revealed that days in milk and milk yield were the main discriminating factors when two clusters were formed. When three clusters emerged, K-means also identified a subgroup of animals that differed from the others in both age and lactation stage. These findings were supported by the analysis of variance (ANOVA), which showed statistically significant differences among most of the evaluated variables. Full article

(This article belongs to the Special Issue Machine Learning Methods and Statistics in Ruminant Farming)

► Show Figures

Figure 1

Search Results (81)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (81)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI