Few-Shot Cross-Domain Deepfake Detection for Edge Devices: A Feature Decoupled System Architecture
Abstract
1. Introduction
- 1.
- We propose a “feature extraction–normalization alignment–independent decision” decoupled detection framework for few-shot tasks on edge devices. Compared to tightly coupled end-to-end models, this system effectively reduces the risk of feature space collapse under significant distribution shifts through architectural modular separation, providing a lightweight and robust system pipeline for detection tasks constrained by pure CPU computing power.
- 2.
- We introduce a lightweight residual normalization adapter as the central alignment module. Experiments confirm that this adapter provides a variance-aligned smooth manifold base, yielding measurable improvements for ensemble tree models under extremely limited samples () and effectively reducing the relative error rate in cross-domain evaluations.
- 3.
- Based on qualitative signal analysis and statistical observations, we discuss the applicable boundaries of the framework. Combining spectral orthogonality and ablation experiments, we analyze the mechanism of “normalization-dominated, residual-assisted” alignment, and explore the performance limitations of linear alignment architectures when facing cross-mechanism generative models (e.g., from GAN to diffusion models), providing a reference for the system’s deployment evaluation in practical engineering.
2. Related Work
2.1. Few-Shot and Cross-Domain Classification
2.2. Feature Optimization and Decoupled Adaptation Architectures
2.3. Application Examples of Physical Prior Features
2.4. Gradient Boosting and Modern Ensemble Classifiers
3. Methodology
3.1. Problem Formulation and Feature Representation
- Original Physical Features (144 dimensions): Lightweight image signal processing executed on the CPU:
- -
- -
- Spatial-domain texture (84 dims): Extract LBP and GLCM statistics in corresponding sub-regions to capture high-frequency discontinuities [11].
- -
- Ocular physiology (12 dims): Extract clarity, symmetry, and fine-grained pupil metrics from the left and right eye regions.
- Original Deep Semantic Features (256 dimensions): This utilizes a pre-trained ResNet-18 with the classification layer removed. ResNet-18 is specifically selected as the semantic backbone because its relatively small parameter footprint and low computational complexity inherently align with the strict resource constraints of edge devices, while still providing sufficient deep feature distinctiveness. Activation vectors are extracted for both face crops and full images, yielding 256-dimensional descriptors via truncated concatenation (accelerated by ONNX Runtime).
- Hybrid Aligned Features (456 dimensions): To mitigate the initial mismatch between physical and deep features, the system zero-pads and truncates the aforementioned physical features into a 200-dimensional space, concatenating them with the 256-dimensional deep features to form a comprehensive feature representation. The 200-dimensional padding is chosen as the smallest power-of-fifty ceiling that accommodates all 144 physical feature dimensions with headroom for future feature additions, while keeping the padded physical block and the 256-dimensional deep block at comparable scales to prevent one modality from dominating during downstream processing.
3.2. Lightweight Residual Normalization Adapter
3.2.1. Optimization Engine Driving Mechanism
3.2.2. Training Protocol and Hyperparameters
3.3. Decoupled Learning and Deployment System
| Algorithm 1 Inference Pipeline of the Decoupled Adaptation System |
|
3.4. Empirical Observations on Generative Mechanism Differences
4. Experimental Evaluation and Application Examples
4.1. Experimental Setup
4.2. System Classifier Matrix Performance Evaluation
4.3. System-Level Architecture Comparative Evaluation
4.4. Discussion on Multi-Dimensional Evaluation Metrics
4.5. Deployment Efficiency and Computational Overhead
5. Analysis and Discussion
5.1. Ablation Study of the Alignment Module
5.2. Performance Boundary Observation Across Generative Mechanisms
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Singh, R.; Gill, S.S. Edge AI: A survey. Internet Things Cyber-Phys. Syst. 2023, 3, 71–92. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sutskever, I. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 2017, 30, 4080–4090. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
- Kang, B.; Xie, S.; Rohrbach, M.; Yan, Z.; Gordo, A.; Feng, J.; Kalantidis, Y. Decoupling representation and classifier for long-tailed recognition. arXiv 2019, arXiv:1910.09217. [Google Scholar]
- Guo, Y.; Codella, N.C.; Karlinsky, L.; Codella, J.V.; Smith, J.R.; Saenko, K.; Feris, R. A broader study of cross-domain few-shot learning. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 124–141. [Google Scholar]
- Corvi, R.; Cozzolino, D.; Poggi, G.; Nagano, K.; Verdoliva, L. On the detection of synthetic images generated by diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 18444–18453. [Google Scholar]
- Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Gelly, S. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 2790–2799. [Google Scholar]
- Frank, J.; Eisenhofer, T.; Schönherr, L.; Fischer, A.; Kolossa, D.; Holz, T. Leveraging frequency analysis for deep fake image recognition. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 3247–3258. [Google Scholar]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Rana, M.S.; Nobi, M.N.; Murali, B.; Sung, A.H. Deepfake detection: A systematic literature review. Sensors 2023, 23, 8763. [Google Scholar] [CrossRef]
- Zi, B.; Chang, M.; Chen, J.; Ma, X.; Jiang, Y.G. WildDeepfake: A challenging real-world dataset for deepfake detection. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2382–2390. [Google Scholar]
- Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural features for image classification. IEEE Trans. Syst. Man. Cybern. 1973, 3, 610–621. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 6639–6649. [Google Scholar]
- Durall, R.; Keuper, M.; Pfreundt, F.J.; Keuper, J. Watch your up-convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7890–7899. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Wang, S.Y.; Wang, O.; Zhang, R.; Owens, A.; Efros, A.A. CNN-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8695–8704. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Ojha, U.; Li, Y.; Lee, Y.J. Towards universal fake image detectors that generalize across generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 24480–24489. [Google Scholar]
- Kessy, A.; Lewin, A.; Strimmer, K. Optimal whitening and decorrelation. Am. Stat. 2018, 72, 309–314. [Google Scholar] [CrossRef]



| Setting A (SG3-UNSEEN) | Setting B (SD-UNSEEN) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Arch. | Classifier | K = 1 | K = 3 | K = 5 | K = 10 | K = 1 | K = 3 | K = 5 | K = 10 |
| Raw | RF (raw) | ||||||||
| SVM (raw) | |||||||||
| XGBoost | |||||||||
| LightGBM | |||||||||
| CatBoost | |||||||||
| KAN (raw) | |||||||||
| Ours | RF + | 0.832 ± 0.018 | 0.840 ± 0.006 | 0.849 ± 0.010 | |||||
| SVM + | 0.614 ± 0.053 | 0.834 ± 0.017 | 0.014 | 0.004 | 0.726 ± 0.008 | ||||
| XGB + | 0.707 ± 0.047 | 0.791 ± 0.030 | 0.721 ± 0.007 | ||||||
| LGBM + | 0.684 ± 0.009 | 0.709 ± 0.018 | 0.803 ± 0.017 | ||||||
| CAT + | 0.835 ± 0.005 | 0.835 ± 0.002 | 0.849 ± 0.011 | ||||||
| KAN + | 0.819 ± 0.045 | 0.838 ± 0.028 | 0.852 ± 0.026 | ||||||
| Setting A (SG3-UNSEEN) | Setting B (SD-UNSEEN) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Arch. | Classifier | K = 1 | K = 3 | K = 5 | K = 10 | K = 1 | K = 3 | K = 5 | K = 10 |
| Raw | RF (raw) | ||||||||
| SVM (raw) | |||||||||
| XGBoost | |||||||||
| LightGBM | |||||||||
| CatBoost | |||||||||
| KAN (raw) | |||||||||
| Ours | RF + | 0.886 ± 0.016 | 0.929 ± 0.015 | 0.940 ± 0.009 | 0.905 ± 0.007 | 0.923 ± 0.007 | 0.950 ± 0.005 | ||
| SVM + | 0.642 ± 0.007 | 0.887 ± 0.013 | 0.924 ± 0.014 | 0.926 ± 0.016 | 0.695 ± 0.017 | 0.896 ± 0.007 | 0.938 ± 0.005 | 0.941 ± 0.006 | |
| XGB + | 0.897 ± 0.016 | 0.923 ± 0.007 | |||||||
| LGBM + | 0.734 ± 0.023 | 0.850 ± 0.035 | 0.911 ± 0.004 | 0.766 ± 0.012 | 0.861 ± 0.008 | 0.898 ± 0.008 | |||
| CAT + | 0.878 ± 0.020 | 0.927 ± 0.018 | 0.942 ± 0.014 | 0.885 ± 0.009 | 0.940 ± 0.005 | 0.955 ± 0.004 | |||
| KAN + | 0.643 ± 0.007 | 0.888 ± 0.014 | 0.921 ± 0.017 | 0.937 ± 0.014 | 0.665 ± 0.014 | 0.885 ± 0.009 | 0.935 ± 0.005 | 0.955 ± 0.004 | |
| UNSEEN (StyleGAN3) | UNSEEN (Stable Diffusion) | SEEN (ProGAN) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | K = 1 | K = 3 | K = 5 | K = 10 | K = 1 | K = 3 | K = 5 | K = 10 | K = 1 | K = 3 | K = 5 | K = 10 |
| CNNDetect | 50.7 | 50.2 | 50.1 | 49.9 | 54.1 | 53.7 | 52.5 | 51.9 | 99.1 | 99.6 | 99.5 | 99.7 |
| UnivFD | 51.1 | 60.2 | 63.8 | 69.1 | 49.8 | 52.2 | 53.4 | 54.2 | 52.7 | 55.8 | 57.3 | 61.0 |
| ProtoNet | 53.1 | 53.9 | 53.6 | 53.5 | 59.8 | 68.9 | 73.2 | 73.3 | 79.2 | 85.9 | 81.4 | 77.8 |
| MAML | 53.9 | 54.3 | 54.6 | 54.3 | 53.9 | 66.8 | 70.6 | 71.8 | 84.4 | 91.8 | 86.7 | 86.7 |
| XceptionNet | 54.3 | 54.1 | 53.6 | 53.4 | 58.7 | 61.6 | 61.3 | 61.8 | 95.9 | 97.7 | 97.4 | 97.6 |
| MobileNetV2 [19] | 51.1 | 50.4 | 50.6 | 50.2 | 52.1 | 51.1 | 50.7 | 51.0 | 87.0 | 99.4 | 99.6 | 99.7 |
| EfficientNet-B0 [20] | 51.4 | 50.3 | 50.0 | 49.9 | 58.0 | 57.3 | 57.3 | 58.6 | 97.9 | 99.8 | 100.0 | 99.9 |
| ViT-Tiny (DeiT) [21] | 52.5 | 51.7 | 51.4 | 51.0 | 55.3 | 52.8 | 52.4 | 52.1 | 96.8 | 99.3 | 99.7 | 99.6 |
| Ours * | 57.8 | 83.2 | 84.0 | 84.9 | 54.6 | 72.6 | 70.1 | 76.1 | 64.2 | 90.5 | 93.8 | 95.0 |
| StyleGAN3 (UNSEEN) | Stable Diffusion (UNSEEN) | |||
|---|---|---|---|---|
| Method | Accuracy (%) | AUC | Accuracy (%) | AUC |
| CNNDetect [22] | 49.9 | 0.561 | 51.9 | 0.659 |
| XceptionNet [23] | 53.4 | 0.678 | 61.8 | 0.748 |
| UnivFD [24] | 69.1 | 0.514 | 54.2 | 0.514 |
| ProtoNet [4] | 53.5 | 0.647 | 73.3 | 0.834 |
| MAML [5] | 54.3 | 0.647 | 71.8 | 0.823 |
| Raw Baseline (Opt. Tree) | 82.7 | 0.965 | 75.1 | 0.812 |
| Raw Baseline (Opt. Kernel) | 85.4 | 0.980 | 77.8 | 0.888 |
| Ours (Opt. Tree + ) | 84.9 | 0.970 | 75.1 | 0.821 |
| Ours (Opt. Kernel + ) | 85.2 | 0.982 | 76.1 | 0.856 |
| Method | Params (M) | Per-Image Time (ms, 1-Thread CPU) | Peak Memory (MB) |
|---|---|---|---|
| CNNDetect [22] | 23.51 | 81.3 | ∼94.0 |
| XceptionNet [23] | 20.81 | 134.1 | ∼83.2 |
| ProtoNet/MAML [4,9] | 1.55 | 58.9 † | ∼6.2 |
| Ours (Static pre-proc.) | 11.69 | 897.2 | 25.0 |
| Ours (Lightweight query) | 0.21 | 7.7 | <1 |
| Variant/Processing Module | K = 1 | K = 3 | K = 5 | K = 10 |
|---|---|---|---|---|
| Physical Only (144-dim) | 49.3 | 51.5 | 51.7 | 50.8 |
| Deep Only (256-dim) | 50.1 | 50.7 | 49.6 | 55.7 |
| Full + No Processing (Raw Concat) | 50.5 | 48.5 | 45.4 | 46.2 |
| Full + PCA | 50.1 | 51.8 | 49.0 | 52.1 |
| Full + ZCA | 50.5 | 52.9 | 52.7 | 51.2 |
| Full + (Ours Decoupled) | 57.8 | 83.2 | 84.0 | 84.9 |
| K | GAN Intra-Family Acc | Cross-Domain Acc | Acc Drop (%) | AUC Drop (%) | Variance Change (%) |
|---|---|---|---|---|---|
| 1 | 0.515 | 0.489 | −4.2 | −0.6 | −50.3 |
| 3 | 0.546 | 0.426 | −20.8 | −26.4 | −79.7 |
| 5 | 0.596 | 0.438 | −26.5 | −35.7 | −82.4 |
| 10 | 0.647 | 0.424 | −34.0 | −50.6 | −80.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ai, Z.; Xu, J.; Lin, W. Few-Shot Cross-Domain Deepfake Detection for Edge Devices: A Feature Decoupled System Architecture. Electronics 2026, 15, 1940. https://doi.org/10.3390/electronics15091940
Ai Z, Xu J, Lin W. Few-Shot Cross-Domain Deepfake Detection for Edge Devices: A Feature Decoupled System Architecture. Electronics. 2026; 15(9):1940. https://doi.org/10.3390/electronics15091940
Chicago/Turabian StyleAi, Zhenpeng, Junfeng Xu, and Weiguo Lin. 2026. "Few-Shot Cross-Domain Deepfake Detection for Edge Devices: A Feature Decoupled System Architecture" Electronics 15, no. 9: 1940. https://doi.org/10.3390/electronics15091940
APA StyleAi, Z., Xu, J., & Lin, W. (2026). Few-Shot Cross-Domain Deepfake Detection for Edge Devices: A Feature Decoupled System Architecture. Electronics, 15(9), 1940. https://doi.org/10.3390/electronics15091940

