This study presents a Convolutional Neural Network (CNN)-based model for classifying offensive and non-offensive Indonesian text using a dataset of 10,054 tweets collected from Twitter/X. The dataset was manually annotated into two classes and processed through a series of text-cleaning, tokenization, and padding steps before being used to train the model. Several training durations were tested to evaluate the effect of epoch variation on model performance. The results show that the model trained for 70 epochs achieved the best overall performance, with a testing accuracy of 86.73%, precision of 0.8793, recall of 0.8834, F1-score of 0.8814, and a ROC-AUC value of 92.08%. The confusion matrix analysis indicates strong classification capability for both classes, with the model performing slightly better in identifying offensive text due to distinctive lexical patterns. These findings demonstrate that the CNN architecture, supported by trainable word embeddings, is effective for Indonesian offensive-text classification. Future improvements may include integrating pretrained language models or expanding the dataset to enhance contextual understanding and robustness.
Copyrights © 2025