FL-TENB4: A Federated-Learning-Enhanced Tiny EfficientNetB4-Lite Approach for Deepfake Detection in CCTV Environments
Abstract
:1. Introduction
2. Related Work
2.1. Existing Studies
2.2. Key Considerations
- Efficiency: Efficiency is a paramount concern in designing a deepfake detection system, particularly for CCTV environments where high volumes of video data require analysis. Conventional deep learning models often prioritize accuracy at the expense of computational feasibility, making them unsuitable for resource-limited edge devices. By leveraging EfficientNetB4-Lite, a lightweight version of EfficientNetB4 specifically optimized for high-definition video deepfake detection, the system achieves a significant balance between accuracy and computational efficiency. The incorporation of compound scaling techniques and further optimization through Tiny Machine Learning (TinyML) enable the system to maintain high detection accuracy while significantly reducing model size and computational overhead. These refinements ensure superior detection performance in resource-constrained environments, making EfficientNetB4-Lite an ideal solution for deployment on edge devices without compromising performance [27].
- Real-time: Another essential consideration is the ability to process video data in real-time. CCTV systems require immediate analysis of incoming video streams to detect manipulations as they occur. The proposed solution leverages the EfficientNetB4-Lite model, an optimized CNN architecture that achieves both computational efficiency and rapid inference speed. The lightweight nature of EfficientNetB4-Lite, combined with TinyML optimizations such as quantization and pruning, ensures that the detection system can handle real-time processing demands. This capability is critical for maintaining the integrity of surveillance footage and enabling prompt responses to potential security threats [28].
- Security and Privacy: The protection of security and privacy is equally vital in the context of CCTV systems. Traditional deep learning approaches often rely on centralized data processing, which poses significant risks to data privacy and security. By adopting a Federated Learning (FL) framework, the proposed architecture addresses these concerns effectively. Federated Learning enables decentralized model training, where sensitive data remain on local devices, and only model updates are shared with a central server. This approach not only preserves user privacy but also reduces the risks associated with data breaches and unauthorized access, aligning with the principles of modern data protection regulations [29].
- Resource constraints: Resource constraints are a significant challenge, as CCTV systems typically operate on edge devices with limited processing power and memory capacity. To address this issue, the design of the proposed solution incorporates TinyML optimizations to ensure compatibility with such hardware limitations. Specifically, quantization techniques such as dynamic range quantization and full integer quantization are applied to EfficientNetB4-Lite. These optimizations significantly reduce model size and inference latency while maintaining detection accuracy. As a result, the system becomes highly feasible for deployment in real-world CCTV environments, even on devices with minimal computational resources [30].
- Scalability: The scalability is a critical consideration for the deployment of deepfake detection systems in large-scale surveillance networks. The Federated Learning framework not only supports decentralized training but also ensures scalability by allowing multiple devices to collaboratively train a global model. This approach eliminates the need for extensive infrastructure and centralized computation, making it highly adaptable to diverse and distributed CCTV systems [31].
3. Proposed Architecture
3.1. System Overview
3.2. TENB4: TinyML-Based EfficientNetB4-Lite Local Model for Deepfake Detection
3.2.1. EfficientNetB4-Lite-Based Deepfake Detection Model
Algorithm 1: EfficientNetB4-Lite based Deepfake Detection Model |
1: Input: Real-time image frame stream from CCTV device (), Pre-trained EfficientNetB4-Lite model optimized for TinyML (Local_Model) 2: Output: Detection result (Real or Fake) for each frame. 3: Process: 4: Initialize TinyML inference engine on Edge Device 5: Load Local_Model (EfficientNetB4-Lite optimized with quantization) 6: while True do 7: = Capture next frame from CCTV stream 8: 9: 10: 11: if then 12: 13: 14: else 15: 16: end if 17: end while |
- 1.
- Input and Initialization: The algorithm takes real-time image frames Frame_t, captured sequentially from a CCTV device’s video stream, as an input, and a pre-trained EfficientNetB4-Lite model optimized for TinyML deployment. The algorithm initializes the TinyML inference engine on the edge device. This engine is implemented using a TensorFlow Lite (TFLite) 2.17.0 interpreter, which facilitates the execution of the EfficientNetB4-Lite model optimized with quantization techniques (such as dynamic range or full integer quantization).Real-Time Frame Acquisition: The algorithm continuously acquires video frames from the CCTV stream in a loop. Each frame represents a snapshot in time and is a three-dimensional tensor ∈ R(H×W×3), where H and W are the original dimensions of the image. These frames are processed sequentially to maintain real-time performance.
- 2.
- Preprocessing: Each frame undergoes a series of preprocessing steps to prepare it for input into the EfficientNetB4-Lite model.
- Resizing: The original frame is resized to 260 × 260 pixels, the input dimension expected by the model. This resizing ensures uniformity and compatibility while maintaining critical visual details for detection. Mathematically:
- Normalization: The pixel values of the resized frame are normalized to the range [0, 1] by dividing each pixel intensity by 255. This normalization stabilizes the model’s numerical computations during inference. Mathematically:
- Tensor Conversion: The normalized frame is converted into a tensor format, with an additional batch dimension added for compatibility with the model. The resulting tensor has a shape of
- 3.
- Model Inference: The preprocessed frame is passed to the EfficientNetB4-Lite model for inference. This model, which has been optimized for TinyML using quantization, predicts probabilities for two outcomes:
- : The probability that the frame is authentic.
- : The probability that the frame is a deepfake.
- The output satisfies the condition . The detection output is represented as follows:
- 4.
- Decision-Making and Alert Generation: The algorithm applies a predefined detection threshold to the fake probability to classify the frame:
- If , the frame is classified as fake, and the system triggers an alert to notify the monitoring authority.
- Otherwise, the frame is classified as real, and no further action is taken.
- 5.
- Continuous Processing: The algorithm operates in an infinite loop, repeatedly capturing, processing, and analyzing frames from the video stream. This ensures uninterrupted surveillance and prompt detection of deepfakes in real time.
3.2.2. Model Compression Through TinyML Optimization
Algorithm 2: Quantization (TinyML Optimization) |
1: Input: Pre-trained Local_Model (EfficientNetB4-Lite), Quantization type (Dynamic Range, Full Integer, or Float16) 2: Output: Optimized TFLite Model (TFLite_Model) 3: Process: 4: Export the Local_Model from training environment 5: Convert Local_Model to TensorFlow Lite format: 6: 7: Apply Quantization techniques: 8: a. Dynamic Range Quantization: 9: Quantized Weights: 10: b. Full Integer Quantization: 11: Quantized Activations: 12: c. Float16 Quantization: 13: Quantized Weights: 14: Deploy the Quantized TFLite Model to Edge Device |
- Dynamic Range Quantization: Dynamic range quantization compresses the model weights by converting them from FP32 to 8-bit integer (INT8) format, while keeping the activations in FP32. This approach does not require calibration data and is relatively straightforward to implement. It offers a balance between model size reduction and preservation of detection accuracy, although it provides a lower level of compression compared to other techniques. This quantization method is particularly suitable for environments where simplicity and rapid deployment are essential, such as general-purpose CPUs or older edge devices without native INT8 support. While dynamic range quantization achieves moderate compression, its use of FP32 activations limits latency reduction. The quantization process is represented mathematically as
- Full Integer Quantization: Full integer quantization compresses both the weights and activations to INT8 format. This method requires calibration data to ensure that the quantized model retains detection accuracy close to the original FP32 model. By quantizing the entire inference pipeline, including activations, full integer quantization maximizes latency reduction and power efficiency. It is thus well-suited for real-time applications on resource-constrained edge devices. This technique is ideal for scenarios where low latency and power efficiency are critical, such as real-time deepfake detection on devices like Raspberry Pi 4 or other ARM-based processors with native INT8 support. The calibration data used during this process ensure the quantized model approximates the behavior of the FP32 model. The quantization process is mathematically expressed as
- Float16 Quantization: Float16 (FP16) quantization reduces model weights to 16-bit floating-point precision. Compared to integer quantization methods, Float16 achieves a moderate level of compression while maintaining higher numerical precision. This method offers a balance between reducing memory usage and preserving accuracy, making it ideal for edge devices with slightly higher computational capabilities. This approach is most effective in deployments requiring higher numerical precision, such as edge devices with GPUs or specialized hardware like NVIDIA Jetson, which natively support FP16 computations. While it provides limited latency reduction compared to full integer quantization, it ensures minimal degradation in model accuracy. The mathematical transformation for Float16 quantization is as follows:
Quantization Type | Compression | Accuracy Preservation | Latency Reduction | Calibration Data | Use Case |
---|---|---|---|---|---|
Dynamic Range Quantization | Moderate | High | Low | Not Required | Prototyping, general-purpose CPUs |
Full Integer Quantization | High | Moderate | High | Required | Real-time detection on low-power devices |
Float16 Quantization | Moderate | Very High | Moderate | Not Required | GPUs or FP16-supported hardware |
3.3. FL-Based Global and Local Model Update to Enhance Deepfake Detection Performance
Algorithm 3: Global Model Update with Federated Averaging (FedAvg) |
1: Input: Local_Model_Updates_i from N Edge devices (i = 1 to N), Initial Global_Model parameters () 2: Output: Updated Global_Model parameters () 3: Process: 4: Initialize Global_Model with parameters . 5: for each Federated Learning round t = 1 to T do 6: Collect Local_Model_Updates_i from all Edge devices i = 1 to N. 7: Perform Federated Averaging: 8: 9: Update Global_Model parameters 10: Distribute Global_Model parameters to all Edge devices. 11: end for 12: Return final Global Model |
- 1.
- Initialization of the Global Model: The first step in the Federated Averaging process is to initialize the global model with the initial parameters These initial parameters can either be pre-trained model weights or randomly initialized values. This global model serves as the starting point for the training and aggregation process across multiple edge devices. The central server (aggregator) stores the global model and coordinates the update process.
- Local training: Each edge device trains its local model on its own dataset.
- Local updates: After training, the local device sends its model updates (parameters) to the central server.
- Global model update: The server aggregates the local updates and computes the new global model using Federated Averaging.
- Global model distribution: The updated global model is sent back to all edge devices for further local training in the next round.
- 2.
- Perform Federated Averaging: After collecting all the local updates from the devices, the central server performs Federated Averaging. In this step, the server computes a weighted average of the local model parameters based on the size of the local datasets on each device. The updated global model is computed as follows:
4. Simulation and Analysis of the Proposed Architecture
4.1. Simulation
4.1.1. Simulation Environment
4.1.2. Simulation Result
4.2. Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jeremiah, S.R.; Ha, J.; Singh, S.K.; Park, J.H. PrivacyGuard: Collaborative Edge-Cloud Computing Architecture for Attribute-Preserving Face Anonymization in CCTV Networks. Hum. Centric Comput. Inf. Sci. 2024, 14, 1–17. [Google Scholar]
- Castro, O.E.L.; Deng, X.; Park, J.H. Comprehensive Survey on AI-Based Technologies for Enhancing IoT Privacy and Security: Trends, Challenges, and Solutions. Hum. Centric Comput. Inf. Sci. 2023, 13, 39. [Google Scholar]
- Welsh, B.C.; Piza, E.L.; Thomas, A.L.; Farrington, D.P. Private security and closed-circuit television (CCTV) surveillance: A systematic review of function and performance. J. Contemp. Crim. Justice 2020, 36, 56–69. [Google Scholar] [CrossRef]
- Kim, K.Y.; Yang, Y.B.; Kim, M.R.; Park, J.S.; Kim, J. MBTI Personality Type Prediction Model Using WZT Analysis Based on the CNN Ensemble and GAN. Hum. Centric Comput. Inf. Sci. 2023, 13, 14. [Google Scholar]
- Khan, P.W.; Byun, Y.C.; Park, N. A data verification system for CCTV surveillance cameras using blockchain technology in smart cities. Electronics 2020, 9, 484. [Google Scholar] [CrossRef]
- López-Gil, J.M.; Gil Iranzo, R.M.; García González, R. Analysis of the Reliability of Deepfake Facial Emotion Expression Synthesis. Hum. Centric Comput. Inf. Sci. 2024, 14. [Google Scholar] [CrossRef]
- Mittal, G.; Hegde, C.; Memon, N. GOTCHA: Real-time video deepfake detection via challenge-response. In Proceedings of the 2024 IEEE 9th European Symposium on Security and Privacy (EuroS&P), Vienna, Austria, 8–12 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–20. [Google Scholar]
- Yu, P.; Xia, Z.; Fei, J.; Lu, Y. A survey on deepfake video detection. IET Biom. 2021, 10, 607–624. [Google Scholar] [CrossRef]
- Lu, L.; Wang, Y.; Zhuo, W.; Zhang, L.; Gao, G.; Guo, Y. Deepfake Detection Via Separable Self-Consistency Learning. In Proceedings of the 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 27–30 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 3264–3270. [Google Scholar]
- Pan, D.; Sun, L.; Wang, R.; Zhang, X.; Sinnott, R.O. Deepfake detection through deep learning. In Proceedings of the 2020 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), Leicester, UK, 7–10 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 134–143. [Google Scholar]
- Stankov, I.S.; Dulgerov, E.E. Detection of Deepfake Images and Videos Using SVM, CNN, and Hybrid Approaches. In Proceedings of the 2024 XXXIII International Scientific Conference Electronics (ET), Sozopol, Bulgaria, 17–19 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
- Nguyen, H.H.; Yamagishi, J.; Echizen, I. Capsule Networks for Deepfake Detection. In Handbook of Digital Face Manipulation and Detection; Elsevier Neurocomputing: Amsterdam, The Netherlands, 2023. [Google Scholar]
- Rossler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1–11. [Google Scholar]
- Rana, M.S.; Sung, A.H. Deepfakestack: A deep ensemble-based learning technique for deepfake detection. In Proceedings of the 2020 7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2020 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), New York, NY, USA, 1–3 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 70–75. [Google Scholar]
- Wu, J.; Zhang, B.; Li, Z.; Pang, G.; Teng, Z.; Fan, J. Interactive two-stream network across modalities for deepfake detection. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6418–6430. [Google Scholar] [CrossRef]
- Al-Dulaimi, O.A.H.H.; Kurnaz, S. A hybrid CNN-LSTM approach for precision deepfake image detection based on transfer learning. Electronics 2024, 13, 1662. [Google Scholar] [CrossRef]
- Kumar, N.; Pranav, P.; Nirney, V.; Geetha, V. Deepfake image detection using CNNs and transfer learning. In Proceedings of the 2021 International Conference on Computing, Communication and Green Engineering (CCGE), Pune, India, 23–25 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
- Yang, R.; You, K.; Pang, C.; Luo, X.; Lan, R. CSTAN: A Deepfake Detection Network with CST Attention for Superior Generalization. Sensors 2024, 24, 7101. [Google Scholar] [CrossRef] [PubMed]
- Liao, X.; Wang, Y.; Wang, T.; Hu, J.; Wu, X. FAMM: Facial muscle motions for detecting compressed deepfake videos over social networks. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7236–7251. [Google Scholar] [CrossRef]
- Tian, Y.; Liao, X.; Dong, L.; Xu, Y.; Jiang, H. Amount-based covert communication over blockchain. IEEE Trans. Netw. Serv. Manag. 2024, 21, 3095–3111. [Google Scholar] [CrossRef]
- Chen, J.; Liao, X.; Wang, W.; Qian, Z.; Qin, Z.; Wang, Y. SNIS: A signal noise separation-based network for post-processed image forgery detection. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 935–951. [Google Scholar] [CrossRef]
- Yazdinejad, A.; Dehghantanha, A.; Karimipour, H.; Srivastava, G.; Parizi, R.M. A robust privacy-preserving federated learning model against model poisoning attacks. IEEE Trans. Inf. Forensics Secur. 2024, 19, 6693–6708. [Google Scholar] [CrossRef]
- Namakshenas, D.; Yazdinejad, A.; Dehghantanha, A.; Srivastava, G. Federated quantum-based privacy-preserving threat detection model for consumer internet of things. IEEE Trans. Consum. Electron. 2024, 70, 5829–5838. [Google Scholar] [CrossRef]
- Namakshenas, D.; Yazdinejad, A.; Dehghantanha, A.; Parizi, R.M.; Srivastava, G. IP2FL: Interpretation-based privacy-preserving federated learning for industrial cyber-physical systems. IEEE Trans. Ind. Cyber-Phys. Syst. 2024, 2, 321–330. [Google Scholar] [CrossRef]
- Yazdinejad, A.; Dehghantanha, A.; Parizi, R.M.; Hammoudeh, M.; Karimipour, H.; Srivastava, G. Block hunter: Federated learning for cyber threat hunting in blockchain-based IIoT networks. IEEE Trans. Ind. Inform. 2022, 18, 8356–8366. [Google Scholar] [CrossRef]
- Zhang, B.; Yin, Q.; Lu, W.; Luo, X. Deepfake Detection and Localization Using Multi-View Inconsistency Measurement. IEEE Trans. Dependable Secur. Comput. 2024, 1–14. [Google Scholar] [CrossRef]
- Naitali, A.; Ridouani, M.; Salahdine, F.; Kaabouch, N. Deepfake attacks: Generation, detection, datasets, challenges, and research directions. Computers 2023, 12, 216. [Google Scholar] [CrossRef]
- Malik, A.; Kuribayashi, M.; Abdullahi, S.M.; Khan, A.N. DeepFake detection for human face images and videos: A survey. IEEE Access 2022, 10, 18757–18775. [Google Scholar] [CrossRef]
- Edwards, P.; Nebel, J.C.; Greenhill, D.; Liang, X. A Review of Deepfake Techniques: Architecture, Detection and Datasets. IEEE Access 2024, 12, 154718–154742. [Google Scholar] [CrossRef]
- Patel, Y.; Tanwar, S.; Gupta, R.; Bhattacharya, P.; Davidson, I.E.; Nyameko, R.; Vimal, V. Deepfake generation and detection: Case study and challenges. IEEE Access 2023, 11, 143296–143323. [Google Scholar] [CrossRef]
- Gong, L.Y.; Li, X.J. A contemporary survey on deepfake detection: Datasets, algorithms, and challenges. Electronics 2024, 13, 585. [Google Scholar] [CrossRef]
- EfficientNet-Lite Pytorch. Available online: https://github.com/RangiLyu/EfficientNet-Lite (accessed on 12 October 2024).
Reference | Authors | Techniques | Description | Limitations |
---|---|---|---|---|
[9] | Chen et al. | Separable Self-Consistency Learning | Uses self-supervised learning to enhance generalization across datasets. | Vulnerable to sophisticated deepfakes that mimic natural consistencies. |
[10] | Mallet et al. | Deep Learning (Xception, MobileNet) | Benchmarks deep learning models to detect visual anomalies in deepfake videos. | High computational demands make it unsuitable for edge environments. |
[11] | Dave et al. | Hybrid CNN-SVM Model | Combines CNN for feature extraction and SVM for classification. | Computationally complex; challenging for real-time, low-power deployment. |
[12] | Nguyen et al. | Capsule Networks | Captures spatial relationships to identify manipulation artifacts. | Struggles with large-scale datasets and diverse manipulation types. |
[13] | Rössler et al. | CNN-Based Detection | Uses CNNs to identify manipulated regions in images from FaceForensics++. | Performance degrades with compression |
[14] | Rana et al. | DeepfakeStack (Ensemble Learning) | Proposed an ensemble-based learning technique that combines multiple deep neural networks to enhance detection accuracy by leveraging the strengths of individual models. | Computationally intensive, which may hinder real-time detection capabilities. |
[15] | Wu et al. | Interactive Two-Stream Network (ITSNet) | Introduced ITSNet to explore discriminant inconsistency representations from cross-modal perspectives, integrating spatial and temporal features for deepfake detection. | The two-stream architecture is computationally demanding, limiting applicability in resource-constrained environments. |
[16] | Al-Dulaimi et al. | Hybrid CNN-LSTM with Transfer Learning | Presented a hybrid architecture combining CNNs for feature extraction and LSTMs for sequence analysis, achieving high detection accuracy using transfer learning techniques. | The hybrid model’s complexity increases computational overhead, posing challenges for real-time applications. |
[17] | Kumar et al. | CNNs, Transfer Learning | Proposed a method using pre-trained CNN architectures, such as ResNet and EfficientNet, to detect deepfake images. Fine-tuning on deepfake datasets improves accuracy while reducing training overhead. | Generalizability is limited when encountering novel deepfake techniques unseen during pre-training. |
[18] | Yang et al. | CST Attention Mechanism (CSTAN) | Proposed a deepfake detection network utilizing Channel, Spatial, and Triple attention mechanisms to recalibrate feature maps and enhance detection accuracy. | Computational complexity may limit real-time applications or deployment on resource-constrained devices. |
[19] | Xin Liao et al. | Facial Muscle Motion Analysis, Temporal Feature Fusion, Dempster–Shafer Fusion | Extracts geometric features from facial landmarks to identify unnatural muscle movements, models temporal inconsistencies across frames, and fuses predictions for robust Deepfake detection. | Relies on precise landmark extraction, which may be affected by occlusion or noise, and has increased computational complexity due to multi-stage processing. |
[20] | Yang Tian et al. | Amount-Based Encoding and Blockchain Utilization | Encodes messages in transaction amounts with AMASC. | Limited to Bitcoin infrastructure. |
[21] | Jiaxin Chen et al. | Signal Noise Separation, Multi-Scale Feature Learning, Feature Fusion | Separates tampered regions and enhances feature precision. | May struggle with extreme noise or heavy compression. |
[22] | Abbas Yazdinejad et al. | Encrypted Gradient Auditing and Homomorphic Encryption | Uses GMM and MD to identify malicious updates and AHE for secure gradient aggregation. | Performance depends on auditor reliability and may incur computational complexity in large-scale deployments. |
[23] | Danyal Namakshenas et al. | Quantum-Based Authentication and Additive Homomorphic Encryption | Validates client integrity and secures FL data with quantum authentication and AHE. | Dependency on quantum infrastructure and increased complexity for large-scale CIoT systems. |
[24] | Danyal Namakshenas et al. | Additive Homomorphic Encryption, Shapley Values, Dual Feature Selection | Combines AHE for secure aggregation, SV for explainable decisions, and feature selection for efficient data management. | High computational demand and challenges integrating legacy systems. |
[25] | Abbas Yazdinejad et al. | Federated Learning, Cluster-Based Architecture, Machine Learning Models (NED, IF, CBLOF) | Combines FL with cluster-based anomaly detection to enhance security and privacy in blockchain-based IIoT. | Limited scalability with increasing clusters and dependency on diverse datasets for generalization. |
[26] | Bolin Zhang et al. | Noise Inconsistency and Temporal Inconsistency Analysis | Identifies tampered regions by measuring noise and temporal inconsistencies across video frames. | May face challenges with videos exhibiting high compression or extreme visual distortions. |
Component | Detail |
---|---|
Dataset | FaceForensics++: High-quality benchmark dataset for deepfake detection |
Dataset of Real and Fake Videos with Diverse Manipulation Techniques Collected via Proprietary CCTV Systems | |
Use compression levels (C0, C23, C40) to test robustness under different quality settings | |
Hardware | Edge Device: Raspberry Pi 4 Model B (4 GB RAM, Quad-core Cortex-A72 CPU) |
Cloud Server: NVIDIA Tesla V100 GPU, 32 GB VRAM for Federated Model Aggregation | |
Software | Model Framework: TensorFlow Lite for Edge Deployment |
Federated Learning Framework: TensorFlow Federated | |
Pre-processing | Input Resolution: 260 × 260 pixels |
Normalization: Pixel values normalized to [0, 1] | |
Tensor Conversion: Input frames converted into tensors for inference | |
Training Setup | Number of Edge Devices: 100 |
Local Training Epochs: 5 per FL round | |
Global Aggregation Rounds: 50 | |
Baseline Comparison | EfficientNet-B4 (non-tiny version) as a baseline to compare performance in resource-intensive setups |
Dataset Version | Accuracy (%) | F1-Score (%) | ROC-AUC | Latency (ms) |
---|---|---|---|---|
C0 | 96.5 | 95.8 | 0.97 | 11.7 |
C23 | 94.2 | 93.5 | 0.96 | 12 |
C40 | 89.8 | 89.2 | 0.94 | 12.2 |
Model | Dataset | Accuracy (%) | F1-Score (%) | ROC-AUC | Latency (ms) |
---|---|---|---|---|---|
FL-TENB4 (Proposed) | FaceForensics++, Custom Dataset | 94.2 | 93.5 | 0.96 | 12 |
Lin Lu et al. [9] | FaceForensics++ | 92.7 | 91.9 | 0.95 | 15 |
Deng Pan et al. [10] | FaceForensics++ | 90.5 | 89.7 | 0.93 | 20 |
Stankov et al. [11] | Custom Dataset | 88.9 | 87.5 | 0.91 | 25 |
Bolin Zhang et al. [26] | Custom Dataset | 91.8 | 90.2 | 0.94 | 18 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ha, J.; El Azzaoui, A.; Park, J.H. FL-TENB4: A Federated-Learning-Enhanced Tiny EfficientNetB4-Lite Approach for Deepfake Detection in CCTV Environments. Sensors 2025, 25, 788. https://doi.org/10.3390/s25030788
Ha J, El Azzaoui A, Park JH. FL-TENB4: A Federated-Learning-Enhanced Tiny EfficientNetB4-Lite Approach for Deepfake Detection in CCTV Environments. Sensors. 2025; 25(3):788. https://doi.org/10.3390/s25030788
Chicago/Turabian StyleHa, Jimin, Abir El Azzaoui, and Jong Hyuk Park. 2025. "FL-TENB4: A Federated-Learning-Enhanced Tiny EfficientNetB4-Lite Approach for Deepfake Detection in CCTV Environments" Sensors 25, no. 3: 788. https://doi.org/10.3390/s25030788
APA StyleHa, J., El Azzaoui, A., & Park, J. H. (2025). FL-TENB4: A Federated-Learning-Enhanced Tiny EfficientNetB4-Lite Approach for Deepfake Detection in CCTV Environments. Sensors, 25(3), 788. https://doi.org/10.3390/s25030788