Advances in artificial intelligence (AI) technology have enabled the creation of synthetic images that resemble real images, posing challenges in detecting and classifying such images. This study aims to develop an EfficientNet-B0 and Vision Transformer (ViT) based classification model to distinguish between real images and images generated by generative AI. The data used consists of 30,401 original images from the MSCOCO 2017 dataset and 30,401 generative AI-generated images from the SyntheticEye AI-Generated Images Dataset on Kaggle. The results showed that the ViT model achieved 98% accuracy and EfficientNet-B0 achieved 96% accuracy in classifying the images. The conclusion of this research is that both models have great potential in detecting digital media manipulation, with ViT showing superior performance. The practical implication of this research is the development of more advanced technologies for detecting generative images, which can be used in various real applications such as digital security and media verification.
Copyrights © 2025