Deepfake video generation has become increasingly sophisticated, posing challenges for detection methods that rely solely on convolutional neural networks (CNNs without explicit texture enhancement). Many existing approaches have limited robustness in capturing subtle texture inconsistencies caused by manipulation, compression, and noise. This study investigates the integration of Local Ternary Pattern (LTP)–based texture enhancement with transfer learning models for deepfake video detection. Specifically, VGG16 and ResNet50 architectures are evaluated using the Celeb-DF (v2) dataset. LTP is employed to extract fine-grained texture features due to its higher robustness to illumination variations and noise compared to conventional descriptors such as Local Binary Pattern (LBP). Video frames are processed individually and used to train CNN classifiers, followed by evaluation at both frame and video levels. Experimental results show that ResNet50 outperforms VGG16, achieving a test accuracy of 93% with a validation loss of 0.2228, while VGG16 reaches an accuracy of 88% with a validation loss of 0.2636. Further testing on 20 withheld videos demonstrates that ResNet50 correctly classifies all samples, whereas VGG16 misclassifies two real videos, indicating lower robustness to real-video misclassification. These results demonstrate that LTP-based texture enhancement effectively supports CNN-based deepfake detection and that deeper architectures benefit more from enriched texture representations. This study provides empirical insights into improving robustness and reliability in deepfake video classification.
Copyrights © 2026