Garuda - Garba Rujukan Digital

CAUCHY: Jurnal Matematika Murni dan Aplikasi

Vol 10, No 2 (2025): CAUCHY: JURNAL MATEMATIKA MURNI DAN APLIKASI

Ulya, Diah Mariatul (Unknown)
Juhari, Juhari (Unknown)
Yuliana, Rossima Eva (Unknown)
Jamhuri, Mohammad (Unknown)

Publish Date
20 Aug 2025

Understanding public opinion at scale is essential for modern media analytics. We present a reproducible, leakage-safe evaluation of logistic regression (LR) for binary sentiment classification on the IMDb Large Movie Review dataset and compare it with five widely used baselines: multinomial Naive Bayes, linear support vector machine (SVM), decision tree, k-nearest neighbors, and random forest. Using a standardized text pipeline (HTML stripping, stopword removal, WordNet lemmatization) with TF–IDF unigrams–bigrams and nested, stratified cross-validation, we assess threshold-dependent and threshold-independent performance, probability calibration, and computational efficiency. LR attains the best overall balance of quality and speed, achieving 88.98% accuracy and 89.13% F1, with strong ranking performance (OOF ROC–AUC ≈ 0.9568; PR–AUC ≈ 0.9554) and well-behaved calibration (Brier ≈ 0.0858). Training completes in seconds per fold and CPU inference reaches about 2.46×10^6 samples per second. While a calibrated linear SVM yields slightly higher precision, LR delivers higher F1 at markedly lower compute. These results establish LR as a robust, transparent baseline that remains competitive with more complex neural and ensemble approaches, offering a favorable performance–efficiency trade-off for practical deployment and reproducible research on IMDb sentiment classification.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

CAUCHY: Jurnal Matematika Murni dan Aplikasi

Website

Abbrev

Math

Publisher

Universitas Islam Negeri Maulana Malik Ibrahim Malang

Subject

Mathematics

Description

Jurnal CAUCHY secara berkala terbit dua (2) kali dalam setahun. Redaksi menerima tulisan ilmiah hasil penelitian, kajian kepustakaan, analisis dan pemecahan permasalahan di bidang Matematika (Aljabar, Analisis, Statistika, Komputasi, dan Terapan). Naskah yang diterima akan dikilas (review) oleh ...

Article Info

Abstract

Reliable and Efficient Sentiment Analysis on IMDb with Logistic Regression

Article Info

Abstract