Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal bit-Tech

Purna Aji Wardhana

Universitas Islam Indonesia

Author-ID : 9844089

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

A Comparative Study of Naïve Bayes, SVM, and BiLSTM for Indonesian Tweet Gender Classification Purna Aji Wardhana; Chanifah Indah Ratnasari
bit-Tech Vol. 8 No. 2 (2025): bit-Tech
Publisher : Komunitas Dosen Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32877/bt.v8i2.3405

Indonesian social media platforms, particularly X (formerly Twitter), generate short and highly informal texts that contain linguistic cues useful for demographic inference. Given the scarcity of controlled comparative studies in Indonesian gender prediction, especially with modest datasets, this research evaluates the performance of Multinomial Naïve Bayes, Linear Support Vector Machine (SVM), and Bidirectional Long Short-Term Memory (BiLSTM) using a balanced corpus of 478 manually labeled Indonesian-language tweets. These three models were selected to represent classical probabilistic learning, margin-based linear classification, and neural sequence modeling, thereby enabling a methodologically coherent comparison across distinct algorithmic paradigms. The study implemented a unified workflow consisting of manual labeling, structured preprocessing with Sastrawi stemming, RandomOverSampler for class balancing, TF-IDF features for classical models, and sequence-based tokenization for BiLSTM. All models were trained and evaluated using a stratified 80:20 split. Experimental results show that Linear SVM achieved the strongest performance, reaching 0.833 accuracy and 0.832 macro-F1, surpassing Naïve Bayes (0.771 accuracy) and BiLSTM (0.740 accuracy). SVM also demonstrated the most stable confusion-matrix distribution and superior AUC characteristics, while BiLSTM exhibited fluctuating validation curves, indicating sensitivity to the limited dataset size. These findings reinforce that classical models—particularly Linear SVM remain highly competitive for Indonesian short-text gender classification in low-resource environments and offer practical advantages where computational constraints and data scarcity are prominent. Although the dataset is topically narrow and limited in scale, the results highlight the need for larger corpora or transformer-based Indonesian models to further enhance generalizability and downstream demographic inference.

Co-Authors Chanifah Indah Ratnasari

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search