CAUCHY: Jurnal Matematika Murni dan Aplikasi
Vol 10, No 2 (2025): CAUCHY: JURNAL MATEMATIKA MURNI DAN APLIKASI

Reliable and Efficient Sentiment Analysis on IMDb with Logistic Regression

Ulya, Diah Mariatul (Unknown)
Juhari, Juhari (Unknown)
Yuliana, Rossima Eva (Unknown)
Jamhuri, Mohammad (Unknown)



Article Info

Publish Date
20 Aug 2025

Abstract

Understanding public opinion at scale is essential for modern media analytics. We present a reproducible, leakage-safe evaluation of logistic regression (LR) for binary sentiment classification on the IMDb Large Movie Review dataset and compare it with five widely used baselines: multinomial Naive Bayes, linear support vector machine (SVM), decision tree, k-nearest neighbors, and random forest. Using a standardized text pipeline (HTML stripping, stopword removal, WordNet lemmatization) with TF–IDF unigrams–bigrams and nested, stratified cross-validation, we assess threshold-dependent and threshold-independent performance, probability calibration, and computational efficiency. LR attains the best overall balance of quality and speed, achieving 88.98% accuracy and 89.13% F1, with strong ranking performance (OOF ROC–AUC ≈ 0.9568; PR–AUC ≈ 0.9554) and well-behaved calibration (Brier ≈ 0.0858). Training completes in seconds per fold and CPU inference reaches about 2.46×10^6 samples per second. While a calibrated linear SVM yields slightly higher precision, LR delivers higher F1 at markedly lower compute. These results establish LR as a robust, transparent baseline that remains competitive with more complex neural and ensemble approaches, offering a favorable performance–efficiency trade-off for practical deployment and reproducible research on IMDb sentiment classification.

Copyrights © 2025






Journal Info

Abbrev

Math

Publisher

Subject

Mathematics

Description

Jurnal CAUCHY secara berkala terbit dua (2) kali dalam setahun. Redaksi menerima tulisan ilmiah hasil penelitian, kajian kepustakaan, analisis dan pemecahan permasalahan di bidang Matematika (Aljabar, Analisis, Statistika, Komputasi, dan Terapan). Naskah yang diterima akan dikilas (review) oleh ...