Syntax: Journal of Software Engineering, Computer Science and Information Technology
Vol 6, No 1 (2025): Juni 2025

IMPLEMENTASI CONVOLUTIONAL NEURAL NETWORK (CNN) DAN CONTRASTIVE LANGUAGE-IMAGE PRETRAINING (CLIP) UNTUK PREDIKSI GENRE FILM BERBASIS ANALISIS POSTER

Windu Wiwaha, Sebastian Kurniawan (Unknown)
Pinaryanto, Kartono (Unknown)



Article Info

Publish Date
30 Jun 2025

Abstract

 Abstrak— Industri perfilman terus berkembang pesat, menghasilkan ribuan film setiap tahun. Klasifikasi genre film menjadi krusial untuk pengelompokan dan sistem rekomendasi. Poster film, sebagai elemen visual utama, seringkali merepresentasikan genre melalui objek, warna, dan desain, namun informasi tekstual seperti plot juga signifikan. Penelitian ini bertujuan membandingkan performa Convolutional Neural Network (CNN) dan Contrastive Language-Image Pretraining (CLIP) dalam klasifikasi genre film multi-label menggunakan analisis poster dan plot. Dataset dari IMDb dan OMDb diproses melalui tahap preprocessing. Model CNN menggunakan arsitektur BiT-ResNet50, sementara CLIP menggunakan ViT-B/16, ViT-L/14, dan RN50x16 untuk poster, serta BERT untuk analisis plot. Eksperimen melibatkan variasi batch size, learning rate, dan optimizer. Hasil menunjukkan CLIP (ViT-L/14) lebih unggul dengan akurasi 83,2% dan Hamming Loss 0,1678, dibandingkan CNN dengan akurasi 77,9%. Integrasi analisis plot menggunakan BERT meningkatkan akurasi sekitar 5% dibandingkan metode berbasis poster saja. Studi ini membuktikan bahwa kombinasi model vision-language (CLIP) dan analisis teks (BERT) lebih efektif daripada CNN konvensional untuk klasifikasi genre film. Kata Kunci—klasifikasi genre film, CNN, CLIP, deep learning, poster film, multi label classification. ABSTRACTAbstract— The film industry continues to develop rapidly, producing thousands of films annually. Film genre classification has become crucial for categorization and recommendation systems. Film posters, as primary visual elements, often represent genres through objects, colors, and design, while textual information such as plot is equally significant. This research aims to compare the performance of Convolutional Neural Network (CNN) and Contrastive Language-Image Pretraining (CLIP) in multi-label film genre classification using poster and plot analysis. The dataset from IMDb and OMDb was processed through preprocessing stages. The CNN model used BiT-ResNet50 architecture, while CLIP used ViT-B/16, ViT-L/14, and RN50x16 for posters, along with BERT for plot analysis. Experiments involved variations in batch size, learning rate, and optimizer. Results show CLIP (ViT-L/14) outperformed with 83.2% accuracy and Hamming Loss of 0.1678, compared to CNN with 77.9% accuracy. Integrating plot analysis using BERT improved accuracy by approximately 5% compared to poster-only methods. This study demonstrates that the combination of vision-language models (CLIP) and text analysis (BERT) is more effective than conventional CNN for film genre classification. Keywords—film genre classification, CNN, CLIP, deep learning, movie posters, multi-label classification.

Copyrights © 2025






Journal Info

Abbrev

syntax

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering

Description

Syntax: Journal of Software Engineering, Computer Science and Information Technology adalah Jurnal ilmiah yang dikelola dan diterbitkan oleh Program Studi Rekayasa Perangkat Lunak, Fakultas Teknik dan Ilmu Komputer, Universitas Dharmawangsa, Medan, Indonesia. Jurnal ini membahas tentang topik-topik ...