Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : mdp student conference

Heuristic Pseudo-Labeling for Review Quality Classification Using TF-IDF and Ensemble Learning Kurniawan, Sandyka Dwi; Nurcahyawati, Vivine
MDP Student Conference Vol 5 No 2 (2026): The 5th MDP Student Conference 2026
Publisher : Universitas Multi Data Palembang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35957/mdp-sc.v5i2.15479

Abstract

Online review datasets rarely provide explicit labels indicating review quality, limiting the application of supervised learning for assessing informativeness. This study proposes a heuristic pseudo-labeling framework that automatically generates review quality labels (High, Medium, Low) from unlabeled text by leveraging lexical richness metrics, language patterns, and rule-based spam indicators. The approach integrates comprehensive text preprocessing, TF-IDF representation, and machine learning classification using Naive Bayes, Logistic Regression, and Random Forest. Experiments on three heterogeneous datasets (Amazon product reviews, movie reviews, and Twitter posts) demonstrate that Random Forest achieves the best performance, confirming the advantage of ensemble learning in modeling complex textual patterns derived from pseudo-labels. The novelty of this work lies in transforming measurable textual characteristics into reliable supervisory signals without manual annotation. The proposed framework offers an interpretable and scalable solution for review quality assessment in Big Data environments and provides a methodological foundation for extending review analytics beyond sentiment polarity toward content quality evaluation.