Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (81)

Search Parameters:
Keywords = Davis dataset

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 352 KB  
Article
Denominational Differentiation and Religiosity Among the Hungarian Minority of Transylvania: Evidence from the European Values Study
by Levente Székedi
Religions 2026, 17(6), 647; https://doi.org/10.3390/rel17060647 - 27 May 2026
Viewed by 200
Abstract
The Hungarian minority of Transylvania comprises four historically received denominations—Roman Catholic, Reformed, Unitarian, and Lutheran—whose institutional profiles differ markedly despite their shared function as carriers of minority cultural identity. Using the European Values Study 2017 Romanian Hungarian minority oversample (GESIS ZA7550; [...] Read more.
The Hungarian minority of Transylvania comprises four historically received denominations—Roman Catholic, Reformed, Unitarian, and Lutheran—whose institutional profiles differ markedly despite their shared function as carriers of minority cultural identity. Using the European Values Study 2017 Romanian Hungarian minority oversample (GESIS ZA7550; N=1106), this article presents the first regression-based analysis of intra-community denominational variation in religiosity in this dataset. Four binary logistic regression models test whether denomination independently predicts church attendance, confidence in church, subjective importance of religion, and self-described religiosity type (institutional versus personalised), net of sociodemographic controls. Catholics attend services significantly more frequently than Reformed members, while Reformed members express higher confidence in their church—a practice–trust reversal explicable by the distinction between canonical obligation and ethnic embeddedness. Subjective religious importance does not vary by denomination, consistent with an identity-protection mechanism operating uniformly across confessions. Denomination does not independently predict institutional versus personalised religiosity type once sociodemographic controls are applied, with age emerging as the dominant axis of variation on this dimension. The findings engage with Davie’s believing/belonging/behaving framework and the debate on whether denominational cleavage or the secular–religious divide constitutes the primary axis of religious differentiation in contemporary Europe. Full article
(This article belongs to the Section Religions and Health/Psychology/Social Sciences)
31 pages, 1116 KB  
Article
AI-Driven Clustering-Based Stratification of Allergic Patients Towards Smart Healthcare Systems in Southern Italy
by Stefano Palazzo, Esra Hazar, Arife Uslu Gokceoglu, Giovanni Zambetta, Roberto Caldelli and Claudio Loconsole
Computers 2026, 15(5), 296; https://doi.org/10.3390/computers15050296 - 7 May 2026
Viewed by 406
Abstract
A clustering analysis was conducted to identify distinct patient subgroups with White Blood Cells (WBC) count alongside Age and Total Immunoglobulin E (IgE) biomarkers. All data were obtained from a coordinated primary care network operating in Apulia (Southern Italy). We analyzed 300 patient [...] Read more.
A clustering analysis was conducted to identify distinct patient subgroups with White Blood Cells (WBC) count alongside Age and Total Immunoglobulin E (IgE) biomarkers. All data were obtained from a coordinated primary care network operating in Apulia (Southern Italy). We analyzed 300 patient records, performed preprocessing and exploratory data analysis, and then applied unsupervised clustering directly to the standardized three-variable feature space (Age, WBC, and Total IgE), followed by supervised validation steps. Several algorithms were applied for clustering. Among the evaluated methods, K-means and Spectral Clustering showed the most favorable internal validation profiles, based on Silhouette Score (SS), Calinski–Harabasz Index (CH), and Davies–Bouldin Index (DB). K-means achieved the best scores (SS = 0.406, CH = 190.00, DB = 0.900), closely followed by Spectral Clustering (SS = 0.398, CH = 182.57, DB = 0.936), outperforming Agglomerative Clustering (SS = 0.361, CH = 160.41, DB = 1.016) and Gaussian Mixture Models (SS = 0.233, CH = 103.89, DB = 1.289). Post-clustering ANOVA analyses indicated significant differences in WBC, age, and total IgE across the five consensus clusters. An evaluation of cluster internal separability occurred through the training of a Random Forest classifier to predict cluster membership. The results indicate internal cluster separability within the analyzed dataset, but more external verification and clinical evidence are necessary for validation. The research group established clinical descriptions along with suggested treatment plans and detected co-existing diseases to help validate model-based findings. A simplified cluster-informed clinical summary based on biomarker ranges was derived to support interpretation of the identified patient profiles. This integrated method preliminarily suggests that patient strata may be identified from routine clinical variables, while highlighting the importance of internal validation and clinical interpretability in clustering research. Full article
Show Figures

Figure 1

33 pages, 3735 KB  
Article
Artificial Neural Network-Based Classification of Industrial Sustainability Profiles for Differentiated Fiscal Policy Design in Remanufacturing Processes
by Marta Lilia Eraña-Díaz, Juana Enríquez-Urbano, Beatriz Martínez-Bahena, Jazmin Yanel Juárez-Chávez, Alfonso D’Granda-Trejo and Javier De-la-Rosa-Mondragon
Processes 2026, 14(9), 1501; https://doi.org/10.3390/pr14091501 - 6 May 2026
Viewed by 464
Abstract
The design of differentiated fiscal instruments for industrial sustainability requires robust, data-driven tools capable of capturing the heterogeneity of environmental performance across manufacturing units—a challenge that conventional econometric approaches address only partially, given the non-linear nature of operational–environmental interactions in reconfigurable production systems. [...] Read more.
The design of differentiated fiscal instruments for industrial sustainability requires robust, data-driven tools capable of capturing the heterogeneity of environmental performance across manufacturing units—a challenge that conventional econometric approaches address only partially, given the non-linear nature of operational–environmental interactions in reconfigurable production systems. This study introduces a two-phase computational framework that integrates unsupervised machine learning and supervised classification to generate evidence-based sustainability profiles for fiscal policy targeting. Its principal contribution is the combination of K-Means clustering with a binary artificial neural network (ANN) classifier, operationalized through an accessible decision-support interface that enables differentiated incentive allocation without requiring programming expertise from policymakers. A dataset of 1000 manufacturing records comprising seven operational and technological input variables—material usage, production capacity, reconfiguration time, downtime, AI optimization, IoT connectivity, and predictive maintenance—and three environmental output indicators—energy consumption, carbon emissions, and waste generation—was analyzed. In Phase One, K-Means segmentation with k = 6, selected through multi-criteria convergence (Silhouette = 0.102; Elbow, Davies–Bouldin, and Calinski–Harabasz indices), identified six distinct sustainability profiles with marked environmental differentiation. In Phase Two, a binary ANN classifier (architecture: 7 → 64 → 32 → 1 neurons; ReLU and sigmoid activations) was trained to distinguish the reference cluster C0 (low environmental impact: energy 145.1 kWh, emissions 45.2 CO2-eq) from the high-impact cluster C1 (emissions 67.8 CO2-eq, waste 41.5 kg). The trained classifier achieved an overall accuracy of 75.4% and an AUC-ROC of 0.774 on the held-out test set, with a macro-averaged F1-score of 0.753 and a Cohen’s kappa coefficient of 0.508, indicating moderate-to-substantial agreement beyond chance. Class C1 (high-impact establishments) achieved a precision of 0.794 and a recall of 0.730, supporting reliable identification of manufacturing units that would most benefit from targeted fiscal support. The framework is deployed through a Gradio-based graphical interface incorporating a traffic-light sustainability classification (green/yellow/red), enabling direct and interactive application by tax authorities and industrial policymakers. The modular architecture supports adaptation to larger or sector-specific datasets, making it transferable across industrial policy contexts. Full article
Show Figures

Graphical abstract

45 pages, 9546 KB  
Article
Unsupervised Hierarchical Visual Taxonomy of Marble Natural Stone Using Cluster-Aware Self-Supervised Vision Transformers
by Margarida Figueiredo, Carlos M. A. Diogo, Gustavo Paneiro, Pedro Amaral and António Alves de Campos
Appl. Sci. 2026, 16(9), 4137; https://doi.org/10.3390/app16094137 - 23 Apr 2026
Viewed by 281
Abstract
The marble industry relies on proprietary commercial names rather than objective visual categories, creating market inefficiencies for stakeholders who select stones based on appearance. Supervised classification perpetuates this problem by replicating inconsistent commercial labels instead of discovering intrinsic visual structure. We propose an [...] Read more.
The marble industry relies on proprietary commercial names rather than objective visual categories, creating market inefficiencies for stakeholders who select stones based on appearance. Supervised classification perpetuates this problem by replicating inconsistent commercial labels instead of discovering intrinsic visual structure. We propose an unsupervised pipeline combining a two-stage training strategy: A pure self-supervised pretraining followed by cluster-aware fine-tuning of a DINO Vision Transformer, with empirically selected dimensionality reduction and agglomerative hierarchical clustering. Systematic ablation studies on 1480 marble images spanning 10 commercial varieties validate each design choice: cluster-aware training at k = 10 yields geometrically improved embeddings over the self-supervised baseline (mean Silhouette Score 0.693 ± 0.053 vs. 0.660 ± 0.030; mean Davies–Bouldin Index 0.386 ± 0.075 vs. 0.569 ± 0.012; N = 9 independent evaluations across 3 data partitions × 3 training initializations). The resulting taxonomy reveals three phenomena invisible to commercial classification: cross-category merging of visually indistinguishable stones carrying different market names, intra-category splitting of heterogeneous sub-populations within single varieties, and coherent grouping where commercial and visual boundaries coincide, with all three confirmed in every independent run. We further demonstrate that standard extrinsic metrics are misaligned with unsupervised taxonomy objectives when reference labels encode the inconsistencies the method aims to resolve. Validating this methodology across diverse stone types, larger datasets, and varied acquisition conditions represents a natural and necessary next step toward establishing its cross-domain generalizability. Full article
Show Figures

Figure 1

19 pages, 2380 KB  
Article
DTBAffinity: A Multi-Modal Feature Engineering and Gradient-Boosting Framework for Drug–Target Binding Affinity on Davis and KIBA Benchmarks
by Meshari Alazmi
Computers 2026, 15(3), 182; https://doi.org/10.3390/computers15030182 - 10 Mar 2026
Viewed by 832
Abstract
An accurate prediction of how strongly a drug binds to its target (where the drug will have the desired effect) is very important for drug discovery. It helps select the most promising compounds and saves money by doing fewer experiments. We present DTBAffinity, [...] Read more.
An accurate prediction of how strongly a drug binds to its target (where the drug will have the desired effect) is very important for drug discovery. It helps select the most promising compounds and saves money by doing fewer experiments. We present DTBAffinity, a multi-modal regression framework that integrates chemically meaningful ligand descriptors with diverse protein sequence features in a unified gradient-boosting model. The representation of ligands includes physicochemical and topological descriptors (RDKit and Mordred), structural keys (MACCS and FP4), circular fingerprints (ECFP/Morgan), and SMILES-derived features from iFeatureOmega. For proteins, thousands of sequence-derived descriptors (composition, autocorrelations, physicochemical profiles, and evolutionary indices) from iFeatureOmega are used, together with contextual embeddings from large protein language models (ESM-1b, ESM-2). The feature matrices are cleaned up, variance filtered, z-score scaled, and univariate selected before being concatenated and modeled with regularized XGBoost ensembles. We evaluate DTBAffinity on two kinase-centric datasets that are commonly used: Davis (30,056 interactions: pKd values) and KIBA (118,254 interactions: integrated affinity scores). Various metrics are used to measure the performance, such as MSE, R2, Pearson/Spearman correlations, Concordance Index (CI), rm2, and AUPR. On Davis, DTBAffinity yields MSE = 0.1885, CI = 0.9102, and AUPR = 0.8112, and on KIBA, it gives MSE = 0.1540, CI = 0.8686, and AUPR = 0.8361; thus, it is better than the state-of-the-art baselines such as KronRLS, SimBoost, DeepDTA, and GraphDTA. The findings here imply that the combination of interpretable descriptors and contextual embeddings in a robust boosting framework is a great way to realize accurate, interpretable, and generalizable DTBA prediction. Full article
(This article belongs to the Special Issue AI in Bioinformatics)
Show Figures

Figure 1

33 pages, 12641 KB  
Article
Exploring the Impact of Different Clustering Algorithms on the Performance of Ensemble Learning-Based Mass Appraisal Models
by Suleyman Sisman, Abdullah Kara and Arif Cagdas Aydinoglu
Buildings 2026, 16(3), 615; https://doi.org/10.3390/buildings16030615 - 2 Feb 2026
Cited by 1 | Viewed by 863
Abstract
Mass appraisal models are gaining use for improving valuation accuracy, yet their performance remains highly sensitive to how spatial and non-spatial data are structured before training. Clustering algorithms can be used to segment heterogeneous property groups into more homogeneous ones, potentially improving predictive [...] Read more.
Mass appraisal models are gaining use for improving valuation accuracy, yet their performance remains highly sensitive to how spatial and non-spatial data are structured before training. Clustering algorithms can be used to segment heterogeneous property groups into more homogeneous ones, potentially improving predictive performance. This study investigates the impact of different clustering algorithms, (i.e., K-Means, K-Medians and the Spatially Constrained Multivariate Clustering Algorithm (SCMCA)), on the performance of prominent ensemble learning-based mass appraisal models (i.e., Random Forest (RF), the Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost) and the Light Gradient Boosting Machine (LightGBM)). Using a comprehensive real estate dataset, clustering quality is evaluated using Silhouette, Calinski–Harabasz, and Davies–Bouldin indices, and the performance of cluster-based ensemble mass appraisal models is then compared. The findings indicate that the best performance is achieved with the SCMCA–LightGBM model combination, which reached RMSE = 0.061 and R2 = 0.722. Furthermore, it is determined that clustering-based models provide improvements of up to 7.26% in MAE, 10.61% in MAPE, and 8.40% in RMSE, depending on the combination. The results show that clustering is an effective preprocessing step that can substantially enhance the predictive performance and overall quality of mass appraisal models. Full article
(This article belongs to the Special Issue Study on Real Estate and Housing Management—2nd Edition)
Show Figures

Figure 1

26 pages, 3132 KB  
Article
An Unsupervised Cloud-Centric Intrusion Diagnosis Framework Using Autoencoder and Density-Based Learning
by Suresh K. S, Thenmozhi Elumalai, Radhakrishnan Rajamani, Anubhav Kumar, Balamurugan Balusamy, Sumendra Yogarayan and Kaliyaperumal Prabu
Future Internet 2026, 18(1), 54; https://doi.org/10.3390/fi18010054 - 19 Jan 2026
Viewed by 839
Abstract
Cloud computing environments generate high-dimensional, large-scale, and highly dynamic network traffic, making intrusion diagnosis challenging due to evolving attack patterns, severe traffic imbalance, and limited availability of labeled data. To address these challenges, this study presents an unsupervised, cloud-centric intrusion diagnosis framework that [...] Read more.
Cloud computing environments generate high-dimensional, large-scale, and highly dynamic network traffic, making intrusion diagnosis challenging due to evolving attack patterns, severe traffic imbalance, and limited availability of labeled data. To address these challenges, this study presents an unsupervised, cloud-centric intrusion diagnosis framework that integrates autoencoder-based representation learning with density-based attack categorization. A dual-stage autoencoder is trained exclusively on benign traffic to learn compact latent representations and to identify anomalous flows using reconstruction-error analysis, enabling effective anomaly detection without prior attack labels. The detected anomalies are subsequently grouped using density-based learning to uncover latent attack structures and support fine-grained multiclass intrusion diagnosis under varying attack densities. Experiments conducted on the large-scale CSE-CIC-IDS2018 dataset demonstrate that the proposed framework achieves an anomaly detection accuracy of 99.46%, with high recall and low false-negative rates in the optimal latent-space configuration. The density-based classification stage achieves an overall multiclass attack classification accuracy of 98.79%, effectively handling both majority and minority attack categories. Clustering quality evaluation reports a Silhouette Score of 0.9857 and a Davies–Bouldin Index of 0.0091, indicating strong cluster compactness and separability. Comparative analysis against representative supervised and unsupervised baselines confirms the framework’s scalability and robustness under highly imbalanced cloud traffic, highlighting its suitability for future Internet cloud security ecosystems. Full article
(This article belongs to the Special Issue Cloud and Edge Computing for the Next-Generation Networks)
Show Figures

Figure 1

21 pages, 3313 KB  
Article
MGF-DTA: A Multi-Granularity Fusion Model for Drug–Target Binding Affinity Prediction
by Zheng Ni, Bo Wei and Yuni Zeng
Int. J. Mol. Sci. 2026, 27(2), 947; https://doi.org/10.3390/ijms27020947 - 18 Jan 2026
Viewed by 692
Abstract
Drug–target affinity (DTA) prediction is one of the core components of drug discovery. Despite considerable advances in previous research, DTA tasks still face several limitations with insufficient multi-modal information of drugs, the inherent sequence length limitation of protein language models, and single attention [...] Read more.
Drug–target affinity (DTA) prediction is one of the core components of drug discovery. Despite considerable advances in previous research, DTA tasks still face several limitations with insufficient multi-modal information of drugs, the inherent sequence length limitation of protein language models, and single attention mechanisms that fail to capture critical multi-scale features. To alleviate the above limitations, we developed a multi-granularity fusion model for drug–target binding affinity prediction, termed MGF-DTA. This model is composed of three fusion modules, specifically as follows. First, the model extracts deep semantic features of SMILES strings through ChemBERTa-2 and integrates them with molecular fingerprints by using gated fusion to enhance the multi-modal information of drugs. In addition, it employs a residual fusion mechanism to integrate the global embeddings from ESM-2 with the local features obtained by the k-mer and principal component analysis (PCA) method. Finally, a hierarchical attention mechanism is employed to extract multi-granularity features from both drug SMILES strings and protein sequences. Comparative analysis with other mainstream methods on the Davis, KIBA, and BindingDB datasets reveals that the MGF-DTA model exhibits outstanding performance advantages. Further, ablation studies confirm the effectiveness of the model components and case study illustrates its robust generalization capability. Full article
Show Figures

Figure 1

36 pages, 6828 KB  
Article
Discriminating Music Sequences Method for Music Therapy—DiMuSe
by Emil A. Canciu, Florin Munteanu, Valentin Muntean and Dorin-Mircea Popovici
Appl. Sci. 2026, 16(2), 851; https://doi.org/10.3390/app16020851 - 14 Jan 2026
Viewed by 501
Abstract
The purpose of this research was to investigate whether music empirically associated with therapeutic effects contains intrinsic informational structures that differentiate it from other sound sequences. Drawing on ontology, phenomenology, nonlinear dynamics, and complex systems theory, we hypothesize that therapeutic relevance may be [...] Read more.
The purpose of this research was to investigate whether music empirically associated with therapeutic effects contains intrinsic informational structures that differentiate it from other sound sequences. Drawing on ontology, phenomenology, nonlinear dynamics, and complex systems theory, we hypothesize that therapeutic relevance may be linked to persistent structural patterns embedded in musical signals rather than to stylistic or genre-related attributes. This paper introduces the Discriminating Music Sequences (DiMuSes) method, an unsupervised, structure-oriented analytical framework designed to detect such patterns. The method applies 24 scalar evaluators derived from statistics, fractal geometry, nonlinear physics, and complex systems, transforming sound sequences into multidimensional vectors that characterize their global temporal organization. Principal Component Analysis (PCA) reduces this feature space to three dominant components (PC1–PC3), enabling visualization and comparison in a reduced informational space. Unsupervised k-Means clustering is subsequently applied in the PCA space to identify groups of structurally similar sound sequences, with cluster quality evaluated using Silhouette and Davies–Bouldin indices. Beyond clustering, DiMuSe implements ranking procedures based on relative positions in the PCA space, including distance to cluster centroids, inter-item proximity, and stability across clustering configurations, allowing melodies to be ordered according to their structural proximity to the therapeutic cluster. The method was first validated using synthetically generated nonlinear signals with known properties, confirming its capacity to discriminate structured time series. It was then applied to a dataset of 39 music and sound sequences spanning therapeutic, classical, folk, religious, vocal, natural, and noise categories. The results show that therapeutic music consistently forms a compact and well-separated cluster and ranks highly in structural proximity measures, suggesting shared informational characteristics. Notably, pink noise and ocean sounds also cluster near therapeutic music, aligning with independent evidence of their regulatory and relaxation effects. DiMuSe-derived rankings were consistent with two independent studies that identified the same musical pieces as highly therapeutic.The present research remains at a theoretical stage. Our method has not yet been tested in clinical or experimental therapeutic settings and does not account for individual preference, cultural background, or personal music history, all of which strongly influence therapeutic outcomes. Consequently, DiMuSe does not claim to predict individual efficacy but rather to identify structural potential at the signal level. Future work will focus on clinical validation, integration of biometric feedback, and the development of personalized extensions that combine intrinsic informational structure with listener-specific response data. Full article
34 pages, 3376 KB  
Article
Lexicographic Preferences Similarity for Coalition Formation in Complex Markets: Introducing PLPSim, HRECS, ContractLex, PriceLex, F@Lex, and PLPGen
by Faria Nassiri-Mofakham, Shadi Farid and Katsuhide Fujita
Information 2026, 17(1), 62; https://doi.org/10.3390/info17010062 - 9 Jan 2026
Viewed by 489
Abstract
Lexicographic preference trees (LP-Trees) provide a compact and expressive representation for modeling complex decision-making scenarios, yet measuring similarity between complete or partial structures remains a challenge. This study introduces PLPSim, a novel metric for quantifying alignment between partial lexicographic preference trees (PLP-Trees) and [...] Read more.
Lexicographic preference trees (LP-Trees) provide a compact and expressive representation for modeling complex decision-making scenarios, yet measuring similarity between complete or partial structures remains a challenge. This study introduces PLPSim, a novel metric for quantifying alignment between partial lexicographic preference trees (PLP-Trees) and develops three coalition formation algorithms—HRECS1, HRECS2, and HRECS3—that leverage PLPSim to group agents with similar preferences. We further propose ContractLex and PriceLex protocols (comprising CLF, CFB, CFW, CFA, CFP) for coalition-based contract and pricing strategies, along with a new evaluation metric, F@Lex, which is designed to assess satisfaction under lexicographic preferences. To illustrate the framework, we generate a synthetic dataset (PLPGen) contextualized in a hybrid renewable energy market, where consumers’ PLP-Trees are aggregated and matched with suppliers’ tariff contracts. Experiments across 162 market scenarios, evaluated using Normalized Discounted Cumulative Gain (nDCG), Davies–Bouldin dispersion, and F@Lex, demonstrate that PLPSim-based coalitions outperform baseline approaches. The combination HRECS3 + CFP yields the highest consumer satisfaction, while HRECS3 + CFB achieves balanced satisfaction for both consumers and suppliers. While electricity tariffs and renewable energy contracts—static and dynamic—serve as the motivating example, the proposed framework generalizes to diverse multi-agent systems, offering a foundation for preference-driven coalition formation, adaptive policy design, and sustainable market optimization. Full article
Show Figures

Graphical abstract

34 pages, 5124 KB  
Article
A Deep Ship Trajectory Clustering Method Based on Feature Embedded Representation Learning
by Yifei Liu, Zhangsong Shi, Bing Fu, Jiankang Ke, Huihui Xu and Xuan Wang
J. Mar. Sci. Eng. 2026, 14(1), 81; https://doi.org/10.3390/jmse14010081 - 31 Dec 2025
Viewed by 644
Abstract
Trajectory clustering is of great significance for identifying behavioral patterns and vessel types of non-cooperative ships. However, existing trajectory clustering methods suffer from limitations in extracting cross-spatiotemporal scale features and modeling the coupling relationship between positional and motion features, which restricts clustering performance. [...] Read more.
Trajectory clustering is of great significance for identifying behavioral patterns and vessel types of non-cooperative ships. However, existing trajectory clustering methods suffer from limitations in extracting cross-spatiotemporal scale features and modeling the coupling relationship between positional and motion features, which restricts clustering performance. To address this, this study proposes a deep ship trajectory clustering method based on feature embedding representation learning (ERL-DTC). The method designs a Temporal Attention-based Multi-scale feature Aggregation Network (TA-MAN) to achieve dynamic fusion of trajectory features from micro to macro scales. A Dual-feature Self-attention Fusion Encoder (DualSFE) is employed to decouple and jointly represent the spatiotemporal position and motion features of trajectories. A two-stage optimization strategy of “pre-training and joint training” is adopted, combining contrastive loss and clustering loss to jointly constrain the embedding representation learning, ensuring it preserves trajectory similarity relationships while being adapted to the clustering task. Experiments on a public vessel trajectory dataset show that for a four-class task (K = 4), ERL-DTC improves ACC by approximately 14.1% compared to the current best deep clustering method, with NMI and ARI increasing by about 28.9% and 30.2%, respectively. It achieves the highest Silhouette Coefficient (SC) and the lowest Davies-Bouldin Index (DBI), indicating a tighter and more clearly separated cluster structure. Furthermore, its inference efficiency is improved by two orders of magnitude compared to traditional point-matching-based methods, without significantly increasing runtime due to model complexity. Ablation studies and parameter sensitivity analysis further validate the necessity of each module design and the rationality of hyperparameter settings. This research provides an efficient and robust solution for feature learning and clustering of vessel trajectories across spatiotemporal scales. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

29 pages, 8289 KB  
Article
Clustering as a Prerequisite for Reliable Machine Learning Prediction of Multi-Odor Systems in Wastewater Treatment
by Su-chul Yoon, Chae-ho Kim and Dong-chul Shin
Atmosphere 2026, 17(1), 18; https://doi.org/10.3390/atmos17010018 - 23 Dec 2025
Viewed by 826
Abstract
Complex odor emissions from wastewater treatment plants consist of multiple volatile compounds that exhibit heterogeneous temporal dynamics and low linear correlations, making accurate prediction and interpretation difficult when analyzed on a single-compound basis. This study investigates whether clustering can serve not only as [...] Read more.
Complex odor emissions from wastewater treatment plants consist of multiple volatile compounds that exhibit heterogeneous temporal dynamics and low linear correlations, making accurate prediction and interpretation difficult when analyzed on a single-compound basis. This study investigates whether clustering can serve not only as an exploratory tool but as an essential preprocessing step to enhance machine-learning performance in multi-odor prediction systems. A total of 22 designated odorants were continuously monitored, and their pairwise dependencies were evaluated using Pearson correlation and mutual information. Data-driven clustering was performed through K-means, hierarchical linkage, and principal-component–based latent grouping, and the resulting structures were quantitatively compared with functional-group-based chemical classifications using the consistency ratio and Jaccard similarity index. Cluster validity was further examined using the Silhouette Coefficient, Davies–Bouldin Index, and Calinski–Harabasz Index. The predictive contribution of clustering was verified by training XGBoost regression models on both raw and cluster-structured datasets. The clustered dataset yielded higher predictive accuracy, with increased R2 and reduced MAE and RMSE across most odorants. SHAP analysis further confirmed that clustering improved model interpretability by stabilizing feature contributions and reducing noise-driven importance shifts. The findings demonstrate that clustering is not a supplementary diagnostic tool, but a prerequisite for building reliable, high-performance machine-learning models in complex odor systems. This integrative framework offers a methodological foundation for multi-odor forecasting, source tracking, and next-generation odor management platforms. Full article
(This article belongs to the Special Issue Environmental Odour (2nd Edition))
Show Figures

Figure 1

29 pages, 3021 KB  
Article
Fog-Aware Hierarchical Autoencoder with Density-Based Clustering for AI-Driven Threat Detection in Smart Farming IoT Systems
by Manikandan Thirumalaisamy, Sumendra Yogarayan, Md Shohel Sayeed, Siti Fatimah Abdul Razak and Ramesh Shunmugam
Future Internet 2025, 17(12), 567; https://doi.org/10.3390/fi17120567 - 10 Dec 2025
Viewed by 805
Abstract
Smart farming relies heavily on IoT automation and data-driven decision making, but this growing connectivity also increases exposure to cyberattacks. Flow-based unsupervised intrusion detection is a privacy-preserving alternative to signature and payload inspection, yet it still faces three challenges: loss of subtle anomaly [...] Read more.
Smart farming relies heavily on IoT automation and data-driven decision making, but this growing connectivity also increases exposure to cyberattacks. Flow-based unsupervised intrusion detection is a privacy-preserving alternative to signature and payload inspection, yet it still faces three challenges: loss of subtle anomaly cues during Autoencoder (AE) compression, instability of fixed reconstruction-error thresholds, and performance degradation of clustering in noisy high-dimensional spaces. To address these issues, we propose a fog-aware two-stage hierarchical AE with latent-space gating, followed by Density-Based Spatial Clustering of Applications with Noise (DBSCAN) for attack categorization. A shallow AE compresses the input into a compact 21-dimensional latent space, reducing computational demand for fog-node deployment. A deep AE then computes reconstruction-error scores to isolate malicious behavior while denoising latent features. Only high-error latent vectors are forwarded to DBSCAN, which improves cluster separability, reduces noise sensitivity, and avoids predefined cluster counts or labels. The framework is evaluated on two benchmark datasets. On CIC IoT-DIAD 2024, it achieves 98.99% accuracy, 0.9897 F1-score, 0.895 Adjusted Rand Index (ARI), and 0.019 Davies–Bouldin Index (DBI). To examine generalizability beyond smart farming traffic, we also evaluate the framework on the CSE-CIC-IDS2018 benchmark, where it achieves 99.33% accuracy, 0.9928 F1-score, 0.9013 ARI, and 0.0174 DBI. These results confirm that the proposed model can reliably detect and categorize major cyberattack families across distinct IoT threat landscapes while remaining compatible with resource-constrained fog computing environments. Full article
(This article belongs to the Special Issue Clustered Federated Learning for Networks)
Show Figures

Figure 1

32 pages, 1584 KB  
Article
Adaptive Sparse Clustering of Mixed Data Using Azzalini-Encoded Ordinal Variables
by Ismail Arjdal, Mohamed Alahiane, Echarif Elharfaoui and Mustapha Rachdi
Axioms 2025, 14(12), 902; https://doi.org/10.3390/axioms14120902 - 7 Dec 2025
Viewed by 482
Abstract
In this paper, we propose a novel sparse clustering method designed for high-dimensional mixed-type data, integrating Azzalini’s score-based encoding for ordinal variables. Our approach aims to retain the inherent nature of each variable type—continuous, ordinal, and nominal—while enhancing clustering quality and interpretability. To [...] Read more.
In this paper, we propose a novel sparse clustering method designed for high-dimensional mixed-type data, integrating Azzalini’s score-based encoding for ordinal variables. Our approach aims to retain the inherent nature of each variable type—continuous, ordinal, and nominal—while enhancing clustering quality and interpretability. To this end, we extend classical distance metrics and adapt the Davies–Bouldin Index (DBI) to better reflect the structure of mixed data. We also introduce a weighted formulation that accounts for the distinct contributions of variable types in the clustering process. Empirical results on simulated and real-world datasets demonstrate that our method consistently achieves better separation and coherence of clusters compared to traditional techniques, while effectively identifying the most informative variables. This work opens promising directions for clustering in complex, high-dimensional settings such as marketing analytics and customer segmentation. Full article
(This article belongs to the Special Issue Stochastic Modeling and Optimization Techniques)
Show Figures

Figure 1

14 pages, 954 KB  
Article
Comparison of K-Means and Hierarchical Clustering Methods for Buffalo Milk Production Data
by Lucia Trapanese, Giovanna Bifulco, Matteo Santinello, Nicola Pasquino, Giuseppe Campanile and Angela Salzano
Animals 2025, 15(22), 3246; https://doi.org/10.3390/ani15223246 - 9 Nov 2025
Cited by 1 | Viewed by 1212
Abstract
This study investigated the use of K-means and hierarchical clustering, to group Italian Mediterranean buffalo using routinely collected test-day records. The analysis was first conducted on a combined dataset comprising three buffalo herds and subsequently on each herd individually. The main objective was [...] Read more.
This study investigated the use of K-means and hierarchical clustering, to group Italian Mediterranean buffalo using routinely collected test-day records. The analysis was first conducted on a combined dataset comprising three buffalo herds and subsequently on each herd individually. The main objective was to determine whether data-driven groupings could be implemented to support improvements in general herd management strategies. Results indicated that K-means consistently outperformed hierarchical clustering across all datasets, as reflected by average silhouette scores (0.17–0.18 vs. 0.10–0.12 for K-means and hierarchical, respectively), favorable Davies–Bouldin Index (DBI; 2.05–2.16 vs. 2.11–2.5 for K-means and hierarchical, respectively) and Calinski–Harabasz Index values (CHI; 1034–3877 vs. 729–2109 for K-means and hierarchical, respectively). K-means identified two clusters in the combined dataset and in two of the three herds, while three clusters were identified in the remaining herd. Cluster composition analysis revealed that days in milk and milk yield were the main discriminating factors when two clusters were formed. When three clusters emerged, K-means also identified a subgroup of animals that differed from the others in both age and lactation stage. These findings were supported by the analysis of variance (ANOVA), which showed statistically significant differences among most of the evaluated variables. Full article
(This article belongs to the Special Issue Machine Learning Methods and Statistics in Ruminant Farming)
Show Figures

Figure 1

Back to TopTop