Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (90)

Search Parameters:
Keywords = privacy-preserving clustering

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 1470 KB  
Article
A Lightweight Privacy-Enhanced Federated Clustering Algorithm for Edge Computing
by Jun Wang, Xianghua Chen, Xing Cheng, Jiantong Zhang, Tao Yu and Kewei Qian
Sensors 2025, 25(24), 7544; https://doi.org/10.3390/s25247544 - 11 Dec 2025
Viewed by 282
Abstract
In edge computing scenarios, the data generated by distributed devices is characterized by its dispersion, heterogeneity, and privacy sensitivity, posing significant challenges to federated clustering, including high communication overhead, difficulty in adapting to non-IID data, and significant privacy leakage risks. To address these [...] Read more.
In edge computing scenarios, the data generated by distributed devices is characterized by its dispersion, heterogeneity, and privacy sensitivity, posing significant challenges to federated clustering, including high communication overhead, difficulty in adapting to non-IID data, and significant privacy leakage risks. To address these issues, this paper proposes a privacy-enhanced federated k-means clustering algorithm based on locality-sensitive hashing, aiming to mine latent knowledge from multi-source distributed data while ensuring data privacy protection. The core innovation of this algorithm lies in leveraging the distance sensitivity of clustering pairs, which effectively mitigates the non-IID problem while preserving data privacy and achieves global clustering in just a single communication round, significantly enhancing its practicality in communication-constrained environments. Specifically, the algorithm first evaluates local data dispersion at the client side, dynamically generates cluster cardinality based on dispersion, and obtains initial clustering centers through the k-means algorithm. Subsequently, it employs locality-sensitive hashing to encrypt the center points, uploading only the encrypted clustering information and weight data to the server, thereby achieving privacy protection without relying on a trusted server. On the server side, a secondary weighted k-means clustering is performed in the encrypted space to generate hashed global centers. Experimental results on the MNIST and CIFAR-10 datasets demonstrate that this method maintains robust clustering performance under non-IID data distributions. Most crucially, through a strict single-round client-to-server communication protocol, this approach significantly reduces communication overhead, providing a distributed data mining solution that is efficient, adaptable, and privacy-preserving for resource-constrained edge computing environments. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

29 pages, 3021 KB  
Article
Fog-Aware Hierarchical Autoencoder with Density-Based Clustering for AI-Driven Threat Detection in Smart Farming IoT Systems
by Manikandan Thirumalaisamy, Sumendra Yogarayan, Md Shohel Sayeed, Siti Fatimah Abdul Razak and Ramesh Shunmugam
Future Internet 2025, 17(12), 567; https://doi.org/10.3390/fi17120567 - 10 Dec 2025
Viewed by 180
Abstract
Smart farming relies heavily on IoT automation and data-driven decision making, but this growing connectivity also increases exposure to cyberattacks. Flow-based unsupervised intrusion detection is a privacy-preserving alternative to signature and payload inspection, yet it still faces three challenges: loss of subtle anomaly [...] Read more.
Smart farming relies heavily on IoT automation and data-driven decision making, but this growing connectivity also increases exposure to cyberattacks. Flow-based unsupervised intrusion detection is a privacy-preserving alternative to signature and payload inspection, yet it still faces three challenges: loss of subtle anomaly cues during Autoencoder (AE) compression, instability of fixed reconstruction-error thresholds, and performance degradation of clustering in noisy high-dimensional spaces. To address these issues, we propose a fog-aware two-stage hierarchical AE with latent-space gating, followed by Density-Based Spatial Clustering of Applications with Noise (DBSCAN) for attack categorization. A shallow AE compresses the input into a compact 21-dimensional latent space, reducing computational demand for fog-node deployment. A deep AE then computes reconstruction-error scores to isolate malicious behavior while denoising latent features. Only high-error latent vectors are forwarded to DBSCAN, which improves cluster separability, reduces noise sensitivity, and avoids predefined cluster counts or labels. The framework is evaluated on two benchmark datasets. On CIC IoT-DIAD 2024, it achieves 98.99% accuracy, 0.9897 F1-score, 0.895 Adjusted Rand Index (ARI), and 0.019 Davies–Bouldin Index (DBI). To examine generalizability beyond smart farming traffic, we also evaluate the framework on the CSE-CIC-IDS2018 benchmark, where it achieves 99.33% accuracy, 0.9928 F1-score, 0.9013 ARI, and 0.0174 DBI. These results confirm that the proposed model can reliably detect and categorize major cyberattack families across distinct IoT threat landscapes while remaining compatible with resource-constrained fog computing environments. Full article
(This article belongs to the Special Issue Clustered Federated Learning for Networks)
Show Figures

Figure 1

28 pages, 1569 KB  
Article
Privacy-Preserving Hierarchical Fog Federated Learning (PP-HFFL) for IoT Intrusion Detection
by Md Morshedul Islam, Wali Mohammad Abdullah and Baidya Nath Saha
Sensors 2025, 25(23), 7296; https://doi.org/10.3390/s25237296 - 30 Nov 2025
Viewed by 412
Abstract
The rapid expansion of the Internet of Things (IoT) across critical sectors such as healthcare, energy, cybersecurity, smart cities, and finance has increased its exposure to cyberattacks. Conventional centralized machine learning-based Intrusion Detection Systems (IDS) face limitations, including data privacy risks, legal restrictions [...] Read more.
The rapid expansion of the Internet of Things (IoT) across critical sectors such as healthcare, energy, cybersecurity, smart cities, and finance has increased its exposure to cyberattacks. Conventional centralized machine learning-based Intrusion Detection Systems (IDS) face limitations, including data privacy risks, legal restrictions on cross-border data transfers, and high communication overhead. To overcome these challenges, we propose Privacy-Preserving Hierarchical Fog Federated Learning (PP-HFFL) for IoT intrusion detection, where fog nodes serve as intermediaries between IoT devices and the cloud, collecting and preprocessing local data, thus training models on behalf of IoT clusters. The framework incorporates a Personalized Federated Learning (PFL) to handle heterogeneous, non-independent, and identically distributed (non-IID) data and leverages differential privacy (DP) to protect sensitive information. Experiments on RT-IoT 2022 and CIC-IoT 2023 datasets demonstrate that PP-HFFL achieves detection accuracy comparable to centralized systems, reduces communication overhead, preserves privacy, and adapts effectively across non-IID data. This hierarchical approach provides a practical and secure solution for next-generation IoT intrusion detection. Full article
Show Figures

Figure 1

35 pages, 1511 KB  
Article
Curriculum Learning and Pattern-Aware Highly Efficient Privacy-Preserving Scheme for Mixed Data Outsourcing with Minimal Utility Loss
by Abdul Majeed, Kyunghyun Lee and Seong Oun Hwang
Appl. Sci. 2025, 15(21), 11849; https://doi.org/10.3390/app152111849 - 6 Nov 2025
Viewed by 491
Abstract
A complex problem when outsourcing personal data for public use is balancing privacy protection with utility, and anonymization is a viable solution to address this issue. However, conventional anonymization methods often overlook global information regarding the composition of attributes in data, leading to [...] Read more.
A complex problem when outsourcing personal data for public use is balancing privacy protection with utility, and anonymization is a viable solution to address this issue. However, conventional anonymization methods often overlook global information regarding the composition of attributes in data, leading to unnecessary computations and high utility loss. To address these problems, we propose a curriculum learning (CL)-based, pattern-aware privacy-preserving scheme that exploits information about attribute composition in the data to enhance utility and privacy without performing unnecessary computations. The CL approach significantly reduces time overheads by sorting data by complexity, and only the most complex (e.g., privacy-sensitive) parts of the data are processed. Our scheme considers both diversity and similarity when forming clusters to effectively address the privacy–utility trade-off. Our scheme prevents substantial changes in data during generalization by protecting generic portions of the data from futile anonymization, and only a limited amount of data is anonymized through a joint application of differential privacy and k-anonymity. We attain promising results by rigorously testing the proposed scheme on three benchmark datasets. Compared to recent anonymization methods, our scheme reduces time complexity by 74.33%, improves data utility by 19.67% and 68.33% across two evaluation metrics, and enhances privacy protection by 29.19%. Our scheme performs 82.66% fewer lookups in generalization hierarchies than existing anonymization methods. In addition, our scheme is very lightweight and is 1.95× faster than the parallel implementation architectures. Our scheme can effectively solve the trade-off between privacy and utility better than prior works in outsourcing personal data enclosed in tabular form. Full article
(This article belongs to the Special Issue Progress in Information Security and Privacy)
Show Figures

Figure 1

22 pages, 2693 KB  
Review
Federated Learning for Cardiovascular Disease Prediction: A Comparative Review of Biosignal- and EHR-Based Approaches
by Hagyeong Ryu, Myungeun Lee, Soo-hyung Kim, Ju Han Kim and Hyung-jeong Yang
Healthcare 2025, 13(21), 2811; https://doi.org/10.3390/healthcare13212811 - 5 Nov 2025
Viewed by 1059
Abstract
Federated Learning (FL) has emerged as a promising framework for multi-institutional medical artificial intelligence, enabling collaborative model development while preserving data privacy and security. Despite increasing research on federated approaches for cardiovascular disease prediction, previous reviews have largely focused on disease-specific perspectives without [...] Read more.
Federated Learning (FL) has emerged as a promising framework for multi-institutional medical artificial intelligence, enabling collaborative model development while preserving data privacy and security. Despite increasing research on federated approaches for cardiovascular disease prediction, previous reviews have largely focused on disease-specific perspectives without systematically comparing data modalities. This study comprehensively examines 28 representative investigations from the past five years, including 17 biosignal-based and 11 electronic health record (EHR)-based applications. Biosignal-based FL emphasizes personalized electrocardiogram (ECG) classification, mitigation of non-independent and identically distributed (Non-IID) data, and Internet of Things (IoT)-based monitoring using methods such as client clustering, asynchronous learning, and Bayesian inference. In contrast, EHR-based studies prioritize large-scale hospital collaboration, adaptive optimization, and secure aggregation through distributed frameworks. By systematically comparing methodological strategies, performance trade-offs, and clinical feasibility, this review highlights the complementary strengths of biosignal- and EHR-based approaches. Biosignal frameworks show strong potential for personalized, low-latency cardiac monitoring, whereas EHR frameworks excel in scalable and privacy-preserving decision support. Building upon the limitations of earlier reviews, this paper introduces data-type-centric design guidelines to enhance the reliability, interpretability, and clinical scalability of FL in cardiovascular diagnosis and prediction. Full article
Show Figures

Figure 1

34 pages, 2025 KB  
Review
EV and Renewable Energy Integration in Residential Buildings: A Global Perspective on Deep Learning, Strategies, and Challenges
by Ahmad Mohsenimanesh, Christopher McNevin and Evgueniy Entchev
World Electr. Veh. J. 2025, 16(11), 603; https://doi.org/10.3390/wevj16110603 - 31 Oct 2025
Viewed by 797
Abstract
Charging electric vehicles (EVs) and integrating renewable energy sources (RESs) are becoming key aspects of residential energy systems. However, the variability of RES generation, combined with uncontrolled EV charging, poses challenges for reliability, power quality, and supply-demand balancing within communities. The challenges only [...] Read more.
Charging electric vehicles (EVs) and integrating renewable energy sources (RESs) are becoming key aspects of residential energy systems. However, the variability of RES generation, combined with uncontrolled EV charging, poses challenges for reliability, power quality, and supply-demand balancing within communities. The challenges only grow when considering other electrified building loads as well. Accurate forecasting of power demand and renewable generation is essential for efficient and sustainable grid operation, optimal use of RESs, and effective energy trading within communities. Deep learning (DL), including supervised, unsupervised, and reinforcement learning (RL), has emerged as a promising solution for predicting consumer demand, renewable generation, and managing energy flows in residential environments. This paper provides a comprehensive review of the development and application of these methods for forecasting and energy management in residential communities. Evaluation metrics across studies indicate that supervised learning can achieve highly accurate forecasting results, especially when integrated with unsupervised K-means clustering and data decomposition. These methods help uncover patterns and relationships within the data while reducing noise, thereby enhancing prediction accuracy. RL shows significant potential in control applications, particularly for charging strategies. Similarly to how V2G-simulators model individual EV usage and simulate large fleets to generate grid-scale predictions, RL can be applied to various aspects of EV fleet management, including vehicle dispatching, smart scheduling, and charging coordination. Traditional methods are also used across different applications and help utilities with planning. However, these methods have limitations and may not always be completely accurate. Our review suggests that integrating hybrid supervised-unsupervised learning methods with RL can significantly improve the sustainability and resilience of energy systems. This approach can improve demand and generation forecasting while enabling smart charging coordination and scheduling for scalable EV fleets integrated with building electrification measures. Furthermore, the review introduces a unifying conceptual framework that links forecasting, optimization, and policy coupling through hierarchical deep learning layers, enabling scalable coordination of EV charging, renewable generation, and building energy management. Despite methodological advances, real-world deployment of hybrid and deep learning frameworks remains constrained by data-privacy restrictions, interoperability issues, and computational demands, highlighting the need for explainable, privacy-preserving, and standardized modeling approaches. To be effective in practice, these methods require robust data acquisition, optimized forecasting and control models, and integrated consideration of transport, building, and grid domains. Furthermore, deployment must account for data privacy regulations, cybersecurity safeguards, model interpretability, and economic feasibility to ensure resilient, scalable, and socially acceptable solutions. Full article
(This article belongs to the Section Energy Supply and Sustainability)
Show Figures

Figure 1

21 pages, 783 KB  
Article
SACW: Semi-Asynchronous Federated Learning with Client Selection and Adaptive Weighting
by Shuaifeng Li, Fangfang Shan, Shiqi Mao, Yanlong Lu, Fengjun Miao and Zhuo Chen
Computers 2025, 14(11), 464; https://doi.org/10.3390/computers14110464 - 27 Oct 2025
Viewed by 520
Abstract
Federated learning (FL), as a privacy-preserving distributed machine learning paradigm, demonstrates unique advantages in addressing data silo problems. However, the prevalent statistical heterogeneity (data distribution disparities) and system heterogeneity (device capability variations) in practical applications significantly hinder FL performance. Traditional synchronous FL suffers [...] Read more.
Federated learning (FL), as a privacy-preserving distributed machine learning paradigm, demonstrates unique advantages in addressing data silo problems. However, the prevalent statistical heterogeneity (data distribution disparities) and system heterogeneity (device capability variations) in practical applications significantly hinder FL performance. Traditional synchronous FL suffers from severe waiting delays due to its mandatory synchronization mechanism, while asynchronous approaches incur model bias issues caused by training pace discrepancies. To tackle these challenges, this paper proposes the SACW framework, which effectively balances training efficiency and model quality through a semi-asynchronous training mechanism. The framework adopts a hybrid strategy of “asynchronous client training–synchronous server aggregation,” combined with an adaptive weighting algorithm based on model staleness and data volume. This approach significantly improves system resource utilization and mitigates system heterogeneity. Simultaneously, the server employs data distribution-aware client clustering and hierarchical selection strategies to construct a training environment characterized by “inter-cluster heterogeneity and intra-cluster homogeneity.” Representative clients from each cluster are selected to participate in model aggregation, thereby addressing data heterogeneity. We conduct comprehensive comparisons with mainstream synchronous and asynchronous FL methods and perform extensive experiments across various model architectures and datasets. The results demonstrate that SACW achieves better performance in both training efficiency and model accuracy under scenarios with system and data heterogeneity. Full article
Show Figures

Figure 1

12 pages, 284 KB  
Article
AI-Enabled Secure and Scalable Distributed Web Architecture for Medical Informatics
by Marian Ileana, Pavel Petrov and Vassil Milev
Appl. Sci. 2025, 15(19), 10710; https://doi.org/10.3390/app151910710 - 4 Oct 2025
Viewed by 920
Abstract
Current medical informatics systems face critical challenges, including limited scalability across distributed institutions, insufficient real-time AI-driven decision support, and lack of standardized interoperability for heterogeneous medical data exchange. To address these challenges, this paper proposes a novel distributed web system architecture for medical [...] Read more.
Current medical informatics systems face critical challenges, including limited scalability across distributed institutions, insufficient real-time AI-driven decision support, and lack of standardized interoperability for heterogeneous medical data exchange. To address these challenges, this paper proposes a novel distributed web system architecture for medical informatics, integrating artificial intelligence techniques and cloud-based services. The system ensures interoperability via HL7 FHIR standards and preserves data privacy and fault tolerance across interconnected medical institutions. A hybrid AI pipeline combining principal component analysis (PCA), K-Means clustering, and convolutional neural networks (CNNs) is applied to diffusion tensor imaging (DTI) data for early detection of neurological anomalies. The architecture leverages containerized microservices orchestrated with Docker Swarm, enabling adaptive resource management and high availability. Experimental validation confirms reduced latency, improved system reliability, and enhanced compliance with medical data exchange protocols. Results demonstrate superior performance with an average latency of 94 ms, a diagnostic accuracy of 91.3%, and enhanced clinical workflow efficiency compared to traditional monolithic architectures. The proposed solution successfully addresses scalability limitations while maintaining data security and regulatory compliance across multi-institutional deployments. This work contributes to the advancement of intelligent, interoperable, and scalable e-health infrastructures aligned with the evolution of digital healthcare ecosystems. Full article
(This article belongs to the Special Issue Data Science and Medical Informatics)
Show Figures

Figure 1

20 pages, 6000 KB  
Article
A Bidding Strategy for Virtual Power Plants in the Day-Ahead Market
by Yueping Kong, Yuqin Chen, Jiao Du, Yongbiao Yang and Qingshan Xu
Energies 2025, 18(18), 4874; https://doi.org/10.3390/en18184874 - 13 Sep 2025
Viewed by 1139
Abstract
Under the context of rapid distributed energy development and ongoing electricity market reforms, this paper investigates bidding strategies for virtual power plants (VPPs) formed by aggregated distributed renewable energy (DRE) in China’s evolving day-ahead electricity market. To address privacy concerns of DRE participants [...] Read more.
Under the context of rapid distributed energy development and ongoing electricity market reforms, this paper investigates bidding strategies for virtual power plants (VPPs) formed by aggregated distributed renewable energy (DRE) in China’s evolving day-ahead electricity market. To address privacy concerns of DRE participants and VPP aggregators during dynamic aggregation, an enhanced Benders decomposition framework is proposed. The methodology first characterizes market uncertainties (e.g., electricity prices and renewable generation output) by clustering them into representative scenarios using K-medoids clustering. A privacy-preserving decentralized optimization model is then formulated: the VPP aggregator solves a master problem to determine bidding decisions, while DRE units independently address subproblems via privacy-protected mathematical constraints that avoid revealing explicit operational details. The framework ensures secure information exchange and computational efficiency. Case studies demonstrate that the proposed model effectively balances privacy protection and bidding performance, outperforming traditional centralized optimization approaches in terms of solution quality and scalability. Full article
Show Figures

Figure 1

14 pages, 869 KB  
Proceeding Paper
A Novel Adaptive Cluster-Based Federated Learning Framework for Anomaly Detection in VANETs
by Ravikumar Ch, P Sudheer, Isha Batra and Falentino Sembiring
Eng. Proc. 2025, 107(1), 79; https://doi.org/10.3390/engproc2025107079 - 10 Sep 2025
Viewed by 690
Abstract
Vehicular Ad Hoc Networks (VANETs) encounter significant hurdles in anomaly detection owing to their dynamic characteristics, scalability demands, and privacy issues. This research presents a new Adaptive Cluster-Based Federated Learning (ACFL) architecture to tackle these challenges. In contrast to conventional machine learning models, [...] Read more.
Vehicular Ad Hoc Networks (VANETs) encounter significant hurdles in anomaly detection owing to their dynamic characteristics, scalability demands, and privacy issues. This research presents a new Adaptive Cluster-Based Federated Learning (ACFL) architecture to tackle these challenges. In contrast to conventional machine learning models, the ACFL framework dynamically organizes cars through the Context-Aware Cluster Manager (CACM), which adjusts clusters according to real-time variables like mobility, node density, and communication patterns. Each cluster utilizes Modified Temporal Neural Networks (MTNNs) for localized anomaly detection, employing time-series analysis to improve precision. Federated learning is enabled via the Hierarchical Aggregation Layer (HAL), which effectively consolidates updates across clusters, ensuring scalability and data confidentiality. The proposed framework was assessed in comparison to established machine learning models, including Support Vector Machines (SVM), Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbor (KNN), and the K-Nearest Neighbors with Kernelized Feature Selection and Clustering(KNN-KFSC) approach, utilizing the VeReMi dataset. Findings demonstrate that ACFL surpasses existing models in identifying abnormalities, including Global Positioning System(GPS)spoofing and Denial of Service (DoS) assaults, exhibiting enhanced accuracy, adaptability, and scalability. This work emphasizes the capability of ACFL to tackle urgent security issues in VANET, facilitating the development of secure next-generation intelligent transportation systems. Full article
Show Figures

Figure 1

14 pages, 1313 KB  
Article
A Fast and Privacy-Preserving Outsourced Approach for K-Means Clustering Based on Symmetric Homomorphic Encryption
by Wanqi Tang and Shiwei Xu
Mathematics 2025, 13(17), 2893; https://doi.org/10.3390/math13172893 - 8 Sep 2025
Viewed by 588
Abstract
Training a machine learning (ML) model always needs many computing resources, and cloud-based outsourced training is a good solution to address the issue of a computing resources shortage. However, the cloud may be untrustworthy, and it may pose a privacy threat to the [...] Read more.
Training a machine learning (ML) model always needs many computing resources, and cloud-based outsourced training is a good solution to address the issue of a computing resources shortage. However, the cloud may be untrustworthy, and it may pose a privacy threat to the training process. Currently, most work makes use of multi-party computation protocols and lattice-based homomorphic encryption algorithms to solve the privacy problem, but these tools are inefficient in communication or computation. Therefore, in this paper, we focus on the k-means and propose a fast and privacy-preserving method for outsourced clustering of k-means models based on symmetric homomorphic encryption (SHE), which is used to encrypt the clustering dataset and model parameters in our scheme. We design an interactive protocol and use various tools to optimize the protocol time overheads. We perform security analysis and detailed evaluation on the performance of our scheme, and the experimental results show that our scheme has better prediction accuracy, as well as lower computation and total overheads. Full article
Show Figures

Figure 1

29 pages, 1260 KB  
Article
Modelling Social Attachment and Mental States from Facebook Activity with Machine Learning
by Stavroula Kridera and Andreas Kanavos
Information 2025, 16(9), 772; https://doi.org/10.3390/info16090772 - 5 Sep 2025
Viewed by 1006
Abstract
Social networks generate vast amounts of data that can reveal patterns of human behaviour, social attachment, and mental states. This paper explores advanced machine learning techniques to detect and model such patterns, focusing on community structures, influential users, and information diffusion pathways. To [...] Read more.
Social networks generate vast amounts of data that can reveal patterns of human behaviour, social attachment, and mental states. This paper explores advanced machine learning techniques to detect and model such patterns, focusing on community structures, influential users, and information diffusion pathways. To address the scale, noise, and heterogeneity of social data, we leverage recent advances in graph theory, natural language processing, and anomaly detection. Our framework combines clustering for community detection, sentiment analysis for emotional state inference, and centrality metrics for influence estimation, while integrating multimodal data—including textual and visual content—for richer behavioural insights. Experimental results demonstrate that the proposed approach effectively extracts actionable knowledge, supporting mental well-being and strengthening digital social ties. Furthermore, we emphasise the role of privacy-preserving methods, such as federated learning, to ensure ethical analysis. These findings lay the groundwork for responsible and effective applications of machine learning in social network analysis. Full article
(This article belongs to the Special Issue Information Extraction and Language Discourse Processing)
Show Figures

Figure 1

23 pages, 3739 KB  
Article
FedDPA: Dynamic Prototypical Alignment for Federated Learning with Non-IID Data
by Oussama Akram Bensiah and Rohallah Benaboud
Electronics 2025, 14(16), 3286; https://doi.org/10.3390/electronics14163286 - 19 Aug 2025
Viewed by 1476
Abstract
Federated learning (FL) has emerged as a powerful framework for decentralized model training, preserving data privacy by keeping datasets localized on distributed devices. However, data heterogeneity, characterized by significant variations in size, statistical distribution, and composition across client datasets, presents a persistent challenge [...] Read more.
Federated learning (FL) has emerged as a powerful framework for decentralized model training, preserving data privacy by keeping datasets localized on distributed devices. However, data heterogeneity, characterized by significant variations in size, statistical distribution, and composition across client datasets, presents a persistent challenge that impairs model performance, compromises generalization, and delays convergence. To address these issues, we propose FedDPA, a novel framework that utilizes dynamic prototypical alignment. FedDPA operates in three stages. First, it computes class-specific prototypes for each client to capture local data distributions, integrating them into an adaptive regularization mechanism. Next, a hierarchical aggregation strategy clusters and combines prototypes from similar clients, which reduces communication overhead and stabilizes model updates. Finally, a contrastive alignment process refines the global model by enforcing intra-class compactness and inter-class separation in the feature space. These mechanisms work in concert to mitigate client drift and enhance global model performance. We conducted extensive evaluations on standard classification benchmarks—EMNIST, FEMNIST, CIFAR-10, CIFAR-100, and Tiny-ImageNet 200—under various non-identically and independently distributed (non-IID) scenarios. The results demonstrate the superiority of FedDPA over state-of-the-art methods, including FedAvg, FedNH, and FedROD. Our findings highlight FedDPA’s enhanced effectiveness, stability, and adaptability, establishing it as a scalable and efficient solution to the critical problem of data heterogeneity in federated learning. Full article
Show Figures

Figure 1

20 pages, 6757 KB  
Article
FLUID: Dynamic Model-Agnostic Federated Learning with Pruning and Knowledge Distillation for Maritime Predictive Maintenance
by Alexandros S. Kalafatelis, Angeliki Pitsiakou, Nikolaos Nomikos, Nikolaos Tsoulakos, Theodoros Syriopoulos and Panagiotis Trakadas
J. Mar. Sci. Eng. 2025, 13(8), 1569; https://doi.org/10.3390/jmse13081569 - 15 Aug 2025
Cited by 1 | Viewed by 1225
Abstract
Predictive maintenance (PdM) is vital to maritime operations; however, the traditional deep learning solutions currently offered heavily depend on centralized data aggregation, which is impractical under the limited connectivity, privacy concerns, and resource constraints found in maritime vessels. Federated Learning addresses privacy by [...] Read more.
Predictive maintenance (PdM) is vital to maritime operations; however, the traditional deep learning solutions currently offered heavily depend on centralized data aggregation, which is impractical under the limited connectivity, privacy concerns, and resource constraints found in maritime vessels. Federated Learning addresses privacy by training models locally, yet most FL methods assume homogeneous client architectures and exchange full model weights, leading to heavy communication overhead and sensitivity to system heterogeneity. To overcome these challenges, we introduce FLUID, a dynamic, model-agnostic FL framework that combines client clustering, structured pruning, and student–teacher knowledge distillation. FLUID first groups vessels into resource tiers and calibrates pruning strategies on the most capable client to determine optimal sparsity levels. In subsequent FL rounds, clients exchange logits over a small reference set, decoupling global aggregation from specific model architectures. We evaluate FLUID on a real-world heavy-fuel-oil purifier dataset under realistic heterogeneous deployment. With mixed pruning across clients, FLUID achieves a global R2 of 0.9352, compared with 0.9757 for a centralized baseline. Predictive consistency also remains high for client-based data, with a mean per-client MAE of 0.02575 ± 0.0021 and a mean RMSE of 0.0419 ± 0.0036. These results demonstrate FLUID’s ability to deliver accurate, efficient, and privacy-preserving PdM in heterogeneous maritime fleets. Full article
(This article belongs to the Special Issue Intelligent Solutions for Marine Operations)
Show Figures

Figure 1

21 pages, 528 KB  
Article
A Privacy-Enhanced Multi-Stage Dimensionality Reduction Vertical Federated Clustering Framework
by Jun Wang, Jiantong Zhang and Xianghua Chen
Electronics 2025, 14(16), 3182; https://doi.org/10.3390/electronics14163182 - 10 Aug 2025
Viewed by 697
Abstract
Federated Clustering (FL clustering) aims to discover latent knowledge in multi-source distributed data through clustering algorithms while preserving data privacy. Federated learning is categorized into horizontal and vertical federated learning based on data partitioning scenarios. Horizontal federated learning is applicable to scenarios with [...] Read more.
Federated Clustering (FL clustering) aims to discover latent knowledge in multi-source distributed data through clustering algorithms while preserving data privacy. Federated learning is categorized into horizontal and vertical federated learning based on data partitioning scenarios. Horizontal federated learning is applicable to scenarios with overlapping feature spaces but different sample IDs across parties. Vertical federated learning facilitates cross-institutional feature complementarity, which is particularly suited for scenarios with highly overlapping sample IDs yet significantly divergent features. As a classic clustering algorithm, k-means has seen extensive improvements and applications in horizontal federated learning. However, its application in vertical federated learning remains insufficiently explored, with room for enhancement in privacy protection and communication efficiency. Simultaneously, client feature imbalance may lead to biased clustering results. To improve communication efficiency, this paper introduces Product Quantization (PQ) to compress high-dimensional data into low-dimensional codes by generating local codebooks. Leveraging the inherent k-means algorithm within PQ, local training preserves data structures while overcoming privacy risks associated with traditional PQ methods that require server-side data reconstruction (which may leak data distributions). To enhance privacy without compromising performance, Multidimensional Scaling (MDS) maps codebook cluster centers into distance-preserving indices. Only these indices are uploaded to the server, eliminating the need for data reconstruction. The server executes k-means on the indices to minimize intra-group similarity and maximize inter-group divergence. This scheme retains original codebooks locally for strict privacy protection.The nested application of PQ and MDS significantly reduces communication volume and frequency while effectively alleviating clustering bias caused by client feature dimension imbalance. Validation on the MNIST dataset confirms that the approach maintains k-means clustering performance while meeting federated learning requirements for privacy and efficiency. Full article
Show Figures

Figure 1

Back to TopTop