Multi-Stream Fusion of Spatial, Frequency, and Attention Features for Robust Deepfake Detection in Low-Resolution Images
DOI:
https://doi.org/10.32473/flairs.39.1.141438Abstract
The increasing realism of generative models makes deepfake detection challenging under the low resolution and compression artifacts which are common in real-world media. While many detectors perform well on high-quality images, their performance degrades when fine-grained spatial details are suppressed, and approaches tailored to low-resolution inputs often fail to generalize across resolutions. We propose SFA-Fuse (Spatial–Frequency–Attention Fusion), a multi-stream deepfake detection framework that integrates spatial, frequency-domain, and noise residual features through lightweight attention-based fusion, enabling robust detection without image restoration. We evaluate SFA-Fuse on Celeb-DF V2 and FaceForensics++ across low, native, and high resolutions (128 × 128, 256 × 256, 384 × 384). Results demonstrate strong performance, achieving up to 99.6% accuracy on Celeb-DF V2 and 85.7% on FaceForensics++, highlighting the effectiveness of multi-domain feature fusion for practical deepfake detection.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Vaishnavi Sen, Cody Laurie, Rashida Hasan

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.