Cognitive assessment through short-answer essays requires a consistent and objective scoring process; however, manual evaluation often suffers from time constraints and inter-rater variability. Automatic Essay Scoring (AES) has emerged as a promising approach to automate the assessment process. This study proposes an optimized Bidirectional Long Short-Term Memory (BiLSTM) model combined with FastText embeddings for Indonesian text classification using semantically augmented data generated by IndoBERT. The training dataset was obtained through the EDA_Synonym_IndoBERT augmentation technique on the UKARA dataset, while the validation and testing datasets consisted of original, non-augmented responses. Model optimization was achieved through the integration of Global Max Pooling to enhance feature representation and class weighting to mitigate class imbalance. Experimental results show that the proposed model achieved an accuracy of 93.49% on the validation set and 78.00% on the independent test set. The performance gap between validation and testing results indicates that, although semantic augmentation increases the diversity of training data, model generalization to previously unseen data remains a challenging issue. Furthermore, the implementation of class weighting improved the model's ability to recognize minority-class instances, achieving a recall score of 92%. These findings demonstrate that architectural optimization and training strategies play a crucial role in improving the performance of Automatic Essay Scoring systems for the Indonesian language
Copyrights © 2026