1. Introduction
Spam emails serve not only to disseminate misinformation, including scams or rumors, but also present a risk to the information security of organizations and individuals. For instance, cybercriminals employ phishing emails to entice recipients into clicking on harmful links or downloading attachments, thereby obtaining access to personal data, bank accounts, and other confidential information [
1]. Spam emails serve as a primary conduit for cyberattacks, exhibiting a notable rise in security threats, particularly through malicious attachments and phishing links, which constitute approximately 43% of their dissemination [
2]. In recent decades, researchers have explored numerous approaches to combat spam emails, including mitigation strategies to avert spam email spoofing attacks [
3], systems for evaluating user behavior in reaction to phishing emails [
4], cybersecurity training to enhance awareness of phishing attack resilience [
5,
6], the development of email security awareness programs [
7], and anti-phishing solutions tailored to specific corporate contexts [
8]. These methods suggest strategies to mitigate spam emails from multiple viewpoints; however, they are frequently challenging to execute and typically necessitate manual intervention for optimal efficacy.
Benefiting from advancements in natural language processing (NLP) technologies, particularly machine learning and deep learning techniques [
9,
10], novel capabilities have been established for spam detection systems [
11]. Initial machine learning detection technologies for spam emails in electronic and IoT platforms predominantly relied on supervised learning [
12,
13,
14]. Supervised learning requires manual dataset annotation, which demands substantial time and effort while being heavily dependent on the quality and quantity of training data [
15]. This dependency poses significant challenges for experimental implementation. Deep learning models with dynamically updated feature spaces [
16]—such as Long Short-Term Memory (LSTM) [
17], Convolutional Neural Network (CNN) [
18], Gated Recurrent Unit (GRU) [
19], and their bidirectional variants—demonstrate superior feature extraction capabilities and outperform traditional machine learning methods in performance metrics. However, single neural network architectures exhibit inherent limitations: constrained temporal dependency capture, insufficient depth in automated hierarchical representation learning, and inadequate acquisition of semantically complex textual features. Additionally, they remain susceptible to overfitting. These limitations collectively result in suboptimal performance of single neural network models, failing to achieve desired effectiveness.
Hybrid deep learning methods [
20] are adept at leveraging contextual information from various modules and layers. However, these methods [
21,
22,
23,
24] necessitate the construction of a dictionary to process text data, and the resultant word vectors are static, complicating the handling of polysemy. Pre-trained models [
25] and large language models [
26,
27] have also been extensively utilized in spam detection; however, their training costs present a substantial obstacle. By incorporating multi-feature fusion (MF), certain methods [
28,
29] improve the model’s feature extraction capabilities. Despite the fact that feature fusion has been demonstrated to be effective in a variety of fields [
30,
31,
32,
33], it has not been able to provide privacy protection when detecting spam.
The original intention of email was to allow users to share information more conveniently and securely [
34], and any spam detection should not come at the cost of leaking private data. Federated learning (FL) [
35,
36] possesses distinct advantages due to its decentralized architecture and distributed machine learning, facilitating collaborative training across numerous devices and servers, thereby effectively safeguarding data privacy and security [
37,
38]. In federated learning, participants autonomously train models on local datasets and solely transmit the model parameters produced during training to the central server. The server consolidates parameters from various participants using an aggregation algorithm, producing globally optimized model parameters that are subsequently relayed to each participant. This procedure persists with successive model enhancements. This process ensures that the original data remains local, mitigating the risks associated with cross-domain data transmission and leakage, thereby fundamentally addressing the data privacy protection challenges inherent in traditional centralized learning. This mechanism can dismantle data silos, facilitating collaborative modeling with multi-source data, thereby enhancing model generalization and robustness while preserving data sovereignty.
Currently, federated learning has been extensively adopted across various artificial intelligence scenarios [
39] including blockchain [
40], smart healthcare [
41,
42], medical imaging [
43], and Internet of Things (IoT) [
44,
45]. These applications leverage federated learning’s unique attributes to prevent private data leakage. While several methods have demonstrated notable performance in spam detection [
46,
47], these approaches tend to exhibit bias toward majority classes when handling imbalanced data distributions, resulting in unstable classification performance and limited detection capability. Single neural network models suffer from training instability that may lead to mode collapse, while multi-module architectures face expressiveness limitations due to excessive complexity. These issues represent critical pain points in contemporary spam detection model research.
To address these challenges, this paper innovatively integrates federated learning frameworks with multi-feature fusion techniques, proposing a novel spam detection model within this architecture. The proposed model employs three key technical enhancements: First, the FedProx aggregation algorithm [
48,
49] optimizes data distribution imbalance. Second, a horse-racing selection strategy improves stability during server-side parameter aggregation. Third, hierarchical multi-feature fusion effectively mitigates limitations inherent in both single and multi-module architectures. Consequently, the model significantly reduces computational overhead while enhancing training stability, achieving dual objectives of preventing sensitive privacy leakage and improving detection efficiency.
The remainder of this paper is organized as follows:
Section 2 introduces recent methods in federated learning and non-federated learning.
Section 3 provides a detailed description of the technologies and methods, including data acquisition and preparation, federated learning optimization strategies, word vector transformation, and pathway prediction.
Section 4 presents the architecture of the federated learning system and discusses the specific structure of the model.
Section 5 presents a detailed analysis of FedAvg and FedProx on six datasets.
Section 6 concludes the paper and proposes research directions for future federated learning methods.
2. Related Research
A substantial body of effective and high-quality research has been conducted on the successful detection and classification of emails. Brindha et al. [
50] proposed a phishing email detection and classification model, ICSOA-DLPEC, utilizing the intelligent cuckoo search (CS) optimization algorithm. Initially, ICSOA-DLPEC conducts a three-step preprocessing procedure comprising email cleansing, tokenization, and elimination of stop words. The CS algorithm is employed to extract pertinent feature vectors, while the N-gram method is integrated with the GRU model to identify and categorize phishing emails. Chinta et al. [
51] utilized a BERT-LSTM hybrid model that, through extensive preprocessing and feature extraction, effectively identifies intricate patterns in phishing emails. While these methods attain remarkable accuracy in spam detection, they fail to address the concern of user privacy protection.
The primary challenge on social networks is safeguarding privacy. The data security of emails is a crucial and persistent concern within social networks. Ul Haq et al. [
47] proposed a federated phishing email filtering (FPF) technology utilizing federated learning, natural language processing (NLP), and deep learning. This technology provides four training modalities: Training from Server Model (TSM), Training from New Data (TND), Re-Training with Incremental Learning (TIL), and Model Averaging (MA). It facilitates spam email detection without exchanging email content, maintaining accuracy consistently between 93% and 96%.
Thapa et al. [
52] integrated the BERT model with federated learning and methodically assessed the performance of BERT [
53], THEMIS, and THEMISb across various data distribution scenarios. The experiments encompassed three distinct scenarios: asymmetric data distribution, highly heterogeneous client datasets, and balanced data distribution. The study revealed that federated learning demonstrates remarkable robustness in complex scenarios involving imbalanced or highly dispersed data distributions. The efficacy of the BERT model in phishing email detection tasks within the centralized learning (CL) framework remains unverified. The research indicated that their method’s efficacy in local training and global aggregation stages is heavily reliant on BERT’s feature extraction abilities. Furthermore, various experiments indicated significant variability in the global model’s test outcomes, underscoring the necessity for enhanced stability.
Kaushal et al. [
54] introduced a federated learning-based equitable clustering method that tackles critical issues such as privacy preservation and the intrinsic data distribution disparity in decentralized systems, exhibiting substantial performance enhancements compared to conventional federated learning models. This approach offers an efficient solution for decentralized spam detection, utilizing the privacy protection attributes of federated learning while improving data fairness, in contrast to centralized models and conventional federated learning. It establishes the groundwork for the extensive implementation of federated learning in privacy-sensitive areas. Nonetheless, despite the technological advancement in federated learning-based fair clustering, its detection accuracy remains significantly improvable.
Venčkauskas et al. [
55] proposed a method to enhance the resilience of the federated learning (FL) global model against Byzantine attacks by exclusively accepting updates from reliable participants. This approach integrates FL with a domain-specific ontology for email, a semantic parser, and a benchmark dataset collected from a heterogeneous email corpus, thereby ensuring a high level of privacy protection. The method demonstrates a strong capability to continuously predict malicious behaviors in client models and exhibits significant effectiveness in countering malicious attacks. By leveraging a heterogeneous email corpus as a benchmark dataset, it effectively addresses challenges arising from data heterogeneity and shows considerable advantages. However, their approach, which is based on a machine learning model, does not achieve fully satisfactory performance across various metrics, with an accuracy of only approximately 80.0%. Although it effectively mitigates malicious attacks, improving the model’s performance remains a direction for future work.
Anh et al. [
56] employed a lightweight Transformer architecture, PhoBERT, for SMS spam detection while preserving privacy through federated learning. They applied different aggregation algorithms for Vietnamese and English emails and conducted experiments on both IID and non-IID email data distributions, with each aggregation algorithm demonstrating highly competitive performance. Although the federated learning approach achieved classification capability comparable to that of centralized training methods, the model used was only a lightweight Transformer architecture without deeper integration of other neural networks. The feature representations extracted by a single Transformer architecture are inherently limited when handling complex semantics. Moreover, the dataset was relatively small and predominantly focused on Vietnamese emails, which may restrict the generalizability of this lightweight framework.
Table 1 compares the advantages and disadvantages of the methods mentioned in the literature, categorizing them into those without federated learning [
23,
50,
51] and federated learning techniques [
47,
52,
54,
55,
56]. It can be observed that non-federated detection methods inherently lack the capacity to safeguard user privacy, whereas federated learning—while offering enhanced protection for email data—introduces its own set of constraints. A comparative evaluation of methodological performance underscores that spam detection must inherently respect the private nature of email communications, thereby warranting the adoption of federated learning for strengthened privacy preservation. Nevertheless, a direct transplantation of conventional methods into a federated learning framework results in substantially compromised performance relative to their centralized counterparts. In response, a key contribution of this work is the strategic integration of previously effective neural architectures—such as BiGRU and BiLSTM—aimed at harnessing their complementary representational strengths to achieve superior overall efficacy. Empirical comparisons demonstrate that the proposed hybrid model yields competitive advantages over existing approaches.
6. Conclusions
The detrimental effects of spam have been thoroughly corroborated by reputable organizations. Despite the varied advancements in current filtering technologies, conventional detection methods continue to exhibit considerable deficiencies in privacy protection. This study presents an email detection model, FPW-BC, that incorporates a federated learning framework alongside feature fusion mechanisms. The fundamental innovation resides in leveraging the attribute of federated learning that permits “the transmission of solely model parameters without disclosing raw data” to guarantee privacy protection. The FPW-BC model establishes a multi-tiered fusion architecture, intensifies the model’s emphasis on critical semantic information through the BiLSTM module, augments the capacity to extract local feature space information via the CNN module, and ultimately integrates the output features of both modules to attain accurate classification. The experimental findings indicate that the model markedly surpasses contemporary federated learning-based detection techniques across six prominent public datasets: CEAS, Enron, Ling, Phishingemail, Spamemail, and Fakephishing. The model attains an accuracy of 99.78% on the Fakephishing dataset and 99.34% on the CEAS dataset.
In practical settings, non-IID data and significant heterogeneity are common, where data distributions across clients exhibit considerable divergence and may even demonstrate severe class or quantity imbalance. Data imbalance constitutes a major practical obstacle, as many methods struggle to achieve robust spam detection performance under such conditions. Moving forward, our work will prioritize tackling data heterogeneity in federated learning through the adoption of enhanced federated optimization algorithms, multi-feature fusion mechanisms, and refined local aggregation strategies, with the goal of alleviating the current limitations of models in these respects.
Although federated learning mitigates privacy risks during parameter uploading, potential data leakage remains during model update and aggregation phases. For instance, when multiple participants submit updates to a central server, adversaries may infer sensitive data characteristics by monitoring model updates or analyzing variations across participants’ updates to deduce statistical properties of local datasets. Such information can be exploited in reconstruction attacks to recover private training data.
To counter these threats, differential privacy can be applied to inject carefully calibrated noise into the output, preventing attackers from inferring individual information or data features through output analysis. Alternatively, homomorphic encryption enables computation on encrypted data without decryption, thereby obscuring model update patterns and protecting data privacy during processing. Given the critical importance of data privacy in today’s networked society, enhancing privacy preservation remains a key direction for our future work.
During the process of privacy preservation, the challenge of data heterogeneity inevitably persists. While addressing such heterogeneity, it becomes imperative to simultaneously maintain the usability and privacy of the data. Therefore, integrating privacy protection mechanisms into solutions for data heterogeneity is likely to emerge as a critical trend. In subsequent work, we will focus on collaborative multi-model design strategies that enhance detection accuracy while ensuring privacy security. This approach aims to effectively mitigate data heterogeneity and promote the dual advancement of both privacy and efficiency in spam detection technology.