Ulya, Diah Mariatul
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Reliable and Efficient Sentiment Analysis on IMDb with Logistic Regression Ulya, Diah Mariatul; Juhari, Juhari; Yuliana, Rossima Eva; Jamhuri, Mohammad
CAUCHY: Jurnal Matematika Murni dan Aplikasi Vol 10, No 2 (2025): CAUCHY: JURNAL MATEMATIKA MURNI DAN APLIKASI
Publisher : Mathematics Department, Universitas Islam Negeri Maulana Malik Ibrahim Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.18860/cauchy.v10i2.33809

Abstract

Understanding public opinion at scale is essential for modern media analytics. We present a reproducible, leakage-safe evaluation of logistic regression (LR) for binary sentiment classification on the IMDb Large Movie Review dataset and compare it with five widely used baselines: multinomial Naive Bayes, linear support vector machine (SVM), decision tree, k-nearest neighbors, and random forest. Using a standardized text pipeline (HTML stripping, stopword removal, WordNet lemmatization) with TF–IDF unigrams–bigrams and nested, stratified cross-validation, we assess threshold-dependent and threshold-independent performance, probability calibration, and computational efficiency. LR attains the best overall balance of quality and speed, achieving 88.98% accuracy and 89.13% F1, with strong ranking performance (OOF ROC–AUC ≈ 0.9568; PR–AUC ≈ 0.9554) and well-behaved calibration (Brier ≈ 0.0858). Training completes in seconds per fold and CPU inference reaches about 2.46×10^6 samples per second. While a calibrated linear SVM yields slightly higher precision, LR delivers higher F1 at markedly lower compute. These results establish LR as a robust, transparent baseline that remains competitive with more complex neural and ensemble approaches, offering a favorable performance–efficiency trade-off for practical deployment and reproducible research on IMDb sentiment classification.