This study proposes a name-based gender prediction model in the Indonesian language by combining the architectures of Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT), Convolutional Neural Network (CNN), and Bidirectional Long Short-Term Memory (BiLSTM). The non-standardized and diverse structure of Indonesian names presents a significant challenge for text-based gender classification tasks. To address this, a hybrid approach was developed to leverage the contextual representation power of IndoBERT, the local pattern extraction capability of CNN, and the sequential dependency modeling strength of BiLSTM. The dataset consists of 4,796 student names from Universitas Bumigora, collected between 2018 and 2023. The preprocessing steps include lowercasing, punctuation removal, label encoding, and train-test splitting. Evaluation results based on accuracy, precision, recall, and F1-score indicate that the IndoBERT-CNN-BiLSTM model achieved the best performance, with an accuracy of 90.94%, F1-score of 91.03%, and training stability without signs of overfitting. This model demonstrates high effectiveness in name-based gender classification and holds strong potential for applications such as population information systems, service personalization, and name-based demographic analysis.
Copyrights © 2025