• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

A Comprehensive Deepfake Detection Framework Based on Multimodal Liveness Detection and Deep Learning Integration


Abstract

Deepfake technology poses severe security and trust challenges, demanding robust detection strategies. We propose a unified audio–visual framework that jointly exploits facial dynamics and speech acoustics to capture subtle forgery traces. The system integrates CNN-based face encoders for spatial features, RNN/Temporal-Conformer blocks for temporal cues, and lightweight Transformers for contextual modeling. An adaptive mid–late fusion module aggregates multimodal embeddings via gated attention and a calibration head, ensuring resilience against partial modality corruption. Preprocessing involves face detection and cropping, log-mel spectrogram extraction from 16 kHz audio, and alignment of video-audio segments with a unimodal fallback mechanism for instances of missing modalities. Experiments on FaceForensics++, DFDC, and FakeAVCeleb-with subject-and speaker-disjoint splits-demonstrate strong generalization, achieving up to 97.3% accuracy and 99.0% AUC under clean conditions. Key contributions include a principled multimodal fusion strategy and a plug-and-play ensemble mechanism that stabilizes training across datasets. Limitations such as computational overhead and the need for further adversarial robustness testing are discussed, while reproducibility is facilitated through released configurations and scripts. Overall, this work advances multimodal Deepfake detection by offering an efficient, accurate, and extensible defense framework.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
N. Yan, K. Han, S. Shin, "A Comprehensive Deepfake Detection Framework Based on Multimodal Liveness Detection and Deep Learning Integration," KSII Transactions on Internet and Information Systems, vol. 20, no. 1, pp. 354-377, 2026. DOI: 10.3837/tiis.2026.01.016.

[ACM Style]
Ning Yan, Kunhee Han, and Seungsoo Shin. 2026. A Comprehensive Deepfake Detection Framework Based on Multimodal Liveness Detection and Deep Learning Integration. KSII Transactions on Internet and Information Systems, 20, 1, (2026), 354-377. DOI: 10.3837/tiis.2026.01.016.

[BibTeX Style]
@article{tiis:105661, title="A Comprehensive Deepfake Detection Framework Based on Multimodal Liveness Detection and Deep Learning Integration", author="Ning Yan and Kunhee Han and Seungsoo Shin and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2026.01.016}, volume={20}, number={1}, year="2026", month={January}, pages={354-377}}