Dina Zatusiva Haq
Universitas Pembangunan Nasional “Veteran” Jawa Timur

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Enhancing Clickbait Headline Identification Performance Without Preprocessing Through Feature Reduction and Sentiment Analysis Moch Deny Pratama; Anisa Nur Azizah; Misbachul Falach Asy'ari; Dimas Novian Aditia Syahputra; M Adamu Islam Mashuri; Binti Kholifah; Rifqi Abdillah; Adinda Putri Pratiwi; Dina Zatusiva Haq
Journal of Applied Informatics Research Vol. 1 No. 1 (2025): July
Publisher : Universitas Negeri Surabaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26740/jair.v1i1.44659

Abstract

This study addresses the challenge of identifying clickbait headlines without relying on conventional text preprocessing, which can be resource-intensive and may degrade contextual integrity. To enhance detection performance, we examine three feature extraction methods: TF-IDF, Word2Vec, and Headline2Vec, an embedding technique designed for short texts like headlines. These features are optimized using feature selection algorithms, including Pearson Correlation Coefficient (PCC), Neighborhood Component Analysis (NCA), and Relief, to reduce dimensionality and enhance relevant signal retention. Sentiment polarity is also integrated as a complementary feature. A comparative evaluation is conducted using several machine learning classifiers, namely Support Vector Classifier (SVC), Random Forest, LightGBM, and XGBoost, across all combinations of feature extraction and selection methods. Results show that the optimal configuration Headline2Vec with Relief and SVC achieves the highest accuracy at 94.40%, outperforming other approaches. This demonstrates the effectiveness of combining semantic vectorization and feature selection for clickbait detection in the absence of traditional preprocessing. The findings support the development of streamlined and scalable classification models capable of maintaining high accuracy while reducing preprocessing overhead, making the proposed method particularly suitable for real-time and large-scale content moderation and news verification systems.