Multi-Stream Fusion of Spatial, Frequency, and Attention Features for Robust Deepfake Detection in Low-Resolution Images

Authors

  • Vaishnavi Sen California State University, Northridge
  • Cody Laurie California State University, Northridge
  • Rashida Hasan California State University, Northridge

DOI:

https://doi.org/10.32473/flairs.39.1.141438

Abstract

The increasing realism of generative models makes deepfake detection challenging under the low resolution and compression artifacts which are common in real-world media. While many detectors perform well on high-quality images, their performance degrades when fine-grained spatial details are suppressed, and approaches tailored to low-resolution inputs often fail to generalize across resolutions. We propose SFA-Fuse (Spatial–Frequency–Attention Fusion), a multi-stream deepfake detection framework that integrates spatial, frequency-domain, and noise residual features through lightweight attention-based fusion, enabling robust detection without image restoration. We evaluate SFA-Fuse on Celeb-DF V2 and FaceForensics++ across low, native, and high resolutions (128 × 128, 256 × 256, 384 × 384). Results demonstrate strong performance, achieving up to 99.6% accuracy on Celeb-DF V2 and 85.7% on FaceForensics++, highlighting the effectiveness of multi-domain feature fusion for practical deepfake detection.

Downloads

Published

06-05-2026

How to Cite

Sen, V., Laurie, C., & Hasan, R. (2026). Multi-Stream Fusion of Spatial, Frequency, and Attention Features for Robust Deepfake Detection in Low-Resolution Images. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141438