Claim Missing Document
Check
Articles

Comprehensive Comparison of TF-IDF and Word2Vec in Product Sentiment Classification Using Machine Learning Models Sinaga, Asra Gretya; Robet, Robet; Pribadi, Octara
Journal of Applied Informatics and Computing Vol. 10 No. 1 (2026): February 2026
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v10i1.11582

Abstract

Sentiment analysis supports data-driven decisions by turning product reviews into reliable polarity labels. We compare four text representations, TF-IDF, TF-IDF reduced via SVD, Word2Vec (trained from scratch), and a hybrid TF-IDF(SVD-300). Word2Vec, for sentiment classification of Indonesian Shopee product reviews from Kaggle (~2.5k texts). After normalization (with optional emoji handling and Indonesian stemming), ratings are mapped to binary sentiment (≤2 negative, ≥4 positive; 3 discarded). Each representation is evaluated with Logistic Regression, Support Vector Machines (linear/RBF), Naive Bayes, and Random Forest under stratified 5-fold cross-validation. TF-IDF with Logistic Regression (C=1.0) yields the best results (F1-macro = 0.816 ± 0.026; Accuracy = 0.816 ± 0.026), with LinearSVC as a strong runner-up. Word2Vec (scratch) performs lower, consistent with limited data being insufficient to learn stable embeddings, while the hybrid representation offers only modest gains over Word2Vec and does not surpass TF-IDF. These findings indicate that TF-IDF is the most reliable and consistent representation for small, short-text review datasets, and they underscore the impact of feature design on downstream classification performance.
Penerapan Algoritma Transformer dalam Aplikasi Parafrase Teks Otomatis Robet, Robet; Kohsasih, Kelvin Leonardi; Darwin, Jenime
TAMIKA: Jurnal Tugas Akhir Manajemen Informatika & Komputerisasi Akuntansi Vol 5 No 1 (2025): TAMIKA: Jurnal Tugas Akhir Manajemen Informatika & Komputerisasi Akuntansi
Publisher : Universitas Methodist Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.46880/tamika.Vol5No1.pp103-109

Abstract

The development of Natural Language Processing (NLP) technology has enabled the creation of automated text manipulation applications, one of which is text paraphrasing. This study aims to implement a Transformer architecture with a focus on Indonesian text for automatic text paraphrasing applications. The model used is a pre-trained Text-to-Text Transfer Transformer (T5), which is fine-tuned using an Indonesian text corpus called the Indo-T5 model. During the training process, the model is trained to understand language structure and context in order to generate paraphrases that are not only grammatically correct but also semantically preserved. Evaluation was conducted using BLEU and ROUGE metrics to measure the similarity between the generated paraphrased texts and manual references. The evaluation results show that the model is capable of producing coherent, relevant paraphrased texts with a good level of lexical variation with a BLEU score of 50.1, and ROUGE-L of 61.7. Thus, this study demonstrates that Transformer-based models can be effectively applied to the task of text paraphrasing in Indonesian.
Aplikasi Deteksi Usia Berbasis Citra Menggunakan Model Deep Learning dengan Arsitektur CNN Robet, Robet; Chandra, Chandra; Setiawan, Jerico
TAMIKA: Jurnal Tugas Akhir Manajemen Informatika & Komputerisasi Akuntansi Vol 5 No 1 (2025): TAMIKA: Jurnal Tugas Akhir Manajemen Informatika & Komputerisasi Akuntansi
Publisher : Universitas Methodist Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.46880/tamika.Vol5No1.pp97-102

Abstract

This research aims to design and implement an age detection application based on facial images using a deep learning approach with a Convolutional Neural Network (CNN) architecture. The model is built to recognize and extract facial features in order to estimate an individual’s age automatically. Facial image datasets were obtained from public sources and enhanced through augmentation techniques such as rotation, flipping, and lighting adjustment to increase data variability. The training process involved splitting the data into training, validation, and testing sets. The model was evaluated using accuracy, precision, recall, and F1-score metrics. The gender detection system achieved an accuracy of 82.99% with a precision of 80.95% for males and 84.47% for females. Recall scores were 85.15% for males and 80.12% for females. For age detection, precision, recall, and F1-score varied across different age groups. Overall, the model demonstrates exemplary performance in age prediction, though it still faces challenges in distinguishing closely spaced age categories.